[ntp:questions] Do logs indicate a config problem?
David Lord
snews at lordynet.org
Fri Jul 16 10:34:06 UTC 2010
Ulrich Windl wrote:
> David Lord <snews at lordynet.org> writes:
>
>> Do logs indicate a config problem?
>>
>> system server1: MSFa, PPSa peer=server2 and remote servers
>> system server2: peer=server1 and remote servers
>> system server3: GPSb, PPSb, server2 and server1
>>
>>
>> This is total logged over period Feb 7 to Feb 12:
>> ntp.log.server1:
>> 7 Feb 19:22:39 ntpd[26820]: clock SHM(0) event 'clk_noreply' (0x01)
>> 7 Feb 19:22:40 ntpd[26820]: clock PPS(0) event 'clk_noreply' (0x01)
>> 7 Feb 19:27:01 ntpd[26820]: synchronized to SHM(0), stratum 0
>> 7 Feb 19:27:01 ntpd[26820]: kernel time sync status change 2001
>> 7 Feb 19:28:01 ntpd[26820]: synchronized to PPS(0), stratum 0
>> 12 Feb 10:13:44 ntpd[26820]: sendto(xxx.xxx.xx.xx) (fd=27): Host is down
>> 12 Feb 10:20:41 ntpd[26820]: ntpd exiting on signal 15
>>
>>
>> This is tiny section of log on server1 with similar entries
>> repeating continuously over whole period Feb 12 to Feb 14. There
>> were too many reboots and restarts over period for peerstats to
>> be useful. I swapped kernels attempting to get frequency offset
>> down from just over 50ppm but the autoconfig still gave 50ppm
>> (on NetBSD-3.1 I could set "options TIMER_FREQ=" to get < 10ppm
>> without problem).
>>
>> ntp.log.server1:
>> 13 Feb 07:07:48 ntpd[22483]: kernel time sync status change 2901
>> 13 Feb 07:08:53 ntpd[22483]: kernel time sync status change 2101
>> 13 Feb 07:09:59 ntpd[22483]: kernel time sync status change 2301
>> 13 Feb 07:17:28 ntpd[22483]: kernel time sync status change 2501
>> 13 Feb 07:18:32 ntpd[22483]: kernel time sync status change 2301
>> 13 Feb 07:19:38 ntpd[22483]: kernel time sync status change 2901
>> 13 Feb 07:20:41 ntpd[22483]: kernel time sync status change 2301
>> 13 Feb 07:21:47 ntpd[22483]: kernel time sync status change 2101
>> 13 Feb 07:22:50 ntpd[22483]: kernel time sync status change 2501
>> 13 Feb 07:23:54 ntpd[22483]: kernel time sync status change 2101
>>
>> from timex.h:
>> 0x0001 enable pll updates
>> 0x0002 enable pps freq discipline
>> 0x0004 enable pps time discipline
>> 0x0100 pps signal present
>> 0x0200 pps signal jitter exceeded
>> 0x0400 pps signal wander exceeded
>> 0x0800 pps signal calibration error
>
> Hi!
>
> I'd say all 0x800 and 0x400 should never happen unless there is a severe
> problem. 0x200 should also happen extremely rarely under normal
> conditions. (based on my NTP/Linux experience several years ago)
>
> Regards,
> Ulrich
>
>>
>> This is total logged over period Feb 14 to Feb 19:
>> ntp.log.server1
>> 14 Feb 10:49:34 ntpd[169]: clock SHM(0) event 'clk_noreply' (0x01)
>> 14 Feb 10:52:53 ntpd[169]: synchronized to xx.xxx.xx.xxx, stratum 2
>> 14 Feb 10:52:53 ntpd[169]: kernel time sync status change 2001
>> 14 Feb 10:58:05 ntpd[169]: synchronized to SHM(0), stratum 0
>> 15 Feb 10:32:40 ntpd[169]: ntpd exiting on signal 15
>> 15 Feb 10:34:00 ntpd[169]: clock SHM(0) event 'clk_noreply' (0x01)
>> 15 Feb 10:40:27 ntpd[169]: synchronized to xx.xxx.xx.xxx, stratum 2
>> 15 Feb 10:40:27 ntpd[169]: kernel time sync status change 2001
>> 15 Feb 10:41:47 ntpd[169]: synchronized to xx.xxx.xx.xxx, stratum 2
>> 15 Feb 10:42:36 ntpd[169]: synchronized to SHM(0), stratum 0
>>
>>
>> This is tiny section of log on server3 with similar entries
>> repeating continuously over whole period Feb 7 to Feb 19.
>> ntp.log.server3:
>> 19 Feb 13:08:24 ntpd[6745]: kernel time sync error \
>>
>> 0x2307<PLL,PPSFREQ,PPSTIME,PPSSIGNAL,PPSJITTER,NANO,MODE=0x0=PLL,CLK=0x0=A>
>> 19 Feb 13:08:39 ntpd[6745]: kernel time sync status change \
>> 0x2107<PLL,PPSFREQ,PPSTIME,PPSSIGNAL,NANO,MODE=0x0=PLL,CLK=0x0=A>
>> 19 Feb 13:13:31 ntpd[6745]: kernel time sync error \
>>
>> 0x2307<PLL,PPSFREQ,PPSTIME,PPSSIGNAL,PPSJITTER,NANO,MODE=0x0=PLL,CLK=0x0=A>
>> 19 Feb 13:13:46 ntpd[6745]: kernel time sync status change \
>> 0x2107<PLL,PPSFREQ,PPSTIME,PPSSIGNAL,NANO,MODE=0x0=PLL,CLK=0x0=A>
>>
>>
>>
>> Otherwise timekeeping is good enough for me:
>>
>> peerstats
>> ident mean rms max
>> ==========================================================================
>> ..... MSF using serial dsr with shm and pps to serial dcd with atom
>> server1 20100208-11 127.127.22.0 0.045 1.038 7.100 PPSa
>> server3 20100208-11 127.127.22.0 -0.043 0.288 4.544 PPSb
>> server3 20100208-11 server1 0.031 0.416 3.485
>> server3 20100208-11 server2 0.422 0.577 5.196
>> ..... MSF config changed to use only serial dcd with shm driver
>> server1 20100215-18 127.127.28.0 0.197 0.427 2.723 MSFa
>> server3 20100215-18 127.127.22.0 0.000 -0.002 0.021 PPSb
>> server3 20100215-18 server1 1.371 0.412 1.139
>> server3 20100215-18 server2 1.452 0.504 1.222
>>
>>
>> Offset from MSF receiver varies significantly with temperature,
>> around 1ms/deg based on offset of server1 seen from server3 and
>> corresponding to similar offset variation vs remote servers.
>>
>> GPSb/PPSb on server3 isn't usable from current location of aerial
>> in bad weather eg. on 8th 9th when signal lost for short periods
>> significantly skewing the stats above:
>>
I can't remember the exact cause unless I go back through
logs for that period but I think serial DCD was still being
used even though I had only parallel PPS. I'd not tried
parallel pps before and it requires various changes to
kernel config and a recompile (in this cases several tries
before had it working).
When I gave the parallel PPS a different device number the
problem went away, although I was making other changes at same
time. There may have been interactions between devices LPT
and PPS devices in kernel. The ntp.log entries after that were
periodic "kernel time sync status change 2001" alternated with
"kernel time sync status change 6001"
I currently had lots of benign errors from a watch xtal
refclock I'd disconnected but on restart of ntpd with those
server and fudge lines removed from ntp.conf the log looks ok.
16 July
09:56:27 ntpd exiting
09:56:39 clock PPS(2) event clk_noreply
10:01:01 synch to GPS_NMEA(2), stratum 0
10:01:01 kernel time sync status change 0x2101
10:02:02 kernel time sync status change 0x2107
10:02:17 synchronized to PPS(2), stratum 0
and "ntpq -p" gives PPS(2) offset 0.000ms and
other servers on lan < 1.4ms.
David
More information about the questions
mailing list