[ntp:questions] Synchronisation lost with swisstime.ethz.ch

frank.olsen at steria.com frank.olsen at steria.com
Thu Aug 11 09:08:26 UTC 2005


Hi all,

Since a week ago, we've got serious problems with our NTP
configuration. There are regular _steps_ of say +2 seconds, or -2, or
+4, etc. This causes a serious problem with jobs on our ERP failing
because they use timestamps and the ERP find timestamps dating in the
past...

The configuration is bascially on of the servers (all running HP-UX
11.00), that synhronise to an external server (swisstime.ethz.ch). This
server does a broadcast on the local subnet. In addition each of the
other servers also do broadcasts and have each other defined as peers.

A short extract from a typical part of the syslog for the server that
does the external synchronisation:

Aug  3 08:50:58 xntpd[1078]: synchronisation lost
Aug  3 09:47:30 xntpd[1078]: synchronized to 129.132.2.21, stratum=1
Aug  4 12:14:05 xntpd[1078]: synchronisation lost
Aug  4 14:13:33 xntpd[1078]: synchronized to 129.132.2.21, stratum=1
Aug  4 14:13:31 xntpd[1078]: time reset (step) -2.266052 s
Aug  4 14:13:31 xntpd[1078]: synchronisation lost
Aug  4 14:19:42 xntpd[1078]: synchronized to 129.132.2.21, stratum=13
Aug  4 14:19:42 xntpd[1078]: time reset (step) 0.269439 s
Aug  4 14:19:42 xntpd[1078]: synchronisation lost
Aug  4 14:25:02 xntpd[1078]: synchronized to 129.132.2.21, stratum=13
Aug  4 14:31:06 xntpd[1078]: synchronisation lost
Aug  4 14:32:10 xntpd[1078]: synchronized to 129.132.2.21, stratum=13
Aug  4 14:39:57 xntpd[1078]: synchronisation lost
Aug  4 14:41:01 xntpd[1078]: synchronized to 129.132.2.21, stratum=13
Aug  4 14:46:05 xntpd[1078]: time reset (step) -3.999183 s
Aug  4 14:46:05 xntpd[1078]: synchronisation lost
Aug  4 14:51:25 xntpd[1078]: synchronized to 129.132.2.21, stratum=13
Aug  4 15:01:34 xntpd[1078]: synchronisation lost
Aug  4 15:07:04 xntpd[1078]: synchronized to 129.132.2.21, stratum=1
Aug  4 15:15:42 xntpd[1078]: time reset (step) 6.000553 s
Aug  4 15:15:42 xntpd[1078]: synchronisation lost

Up until this month we never had steps of more than around +/- 0.2
seconds.

We do have regular connection problems on the external network.

In trying to debug this problem I've been using ntpdate. I've also
temporarily desactivated the "external" synchronisation (by commenting
out server 129.132.2.21 in ntp.conf and restarting xntpd). Currently,
ntpdate -d gives:

# ntpdate -d 129.132.2.21
11 Aug 10:55:39 ntpdate[24036]: ntpdate version 3.5f: Fri Dec 10
18:32:44 GMT 1999 PHNE_19711
transmit(129.132.2.21)
receive(129.132.2.21)
transmit(129.132.2.21)
receive(129.132.2.21)
transmit(129.132.2.21)
receive(129.132.2.21)
transmit(129.132.2.21)
receive(129.132.2.21)
transmit(129.132.2.21)
server 129.132.2.21, port 123
stratum 13, precision -18, leap 00, trust 000
refid [127.127.1.0], delay 0.03419, dispersion 0.00050
transmitted 4, in filter 4
reference time:      c6a58f28.f605681e  Thu, Aug 11 2005 10:47:36.961
originate timestamp: c6a59109.aec1094a  Thu, Aug 11 2005 10:55:37.682
transmit timestamp:  c6a5910b.c1fca000  Thu, Aug 11 2005 10:55:39.757
filter delay:  0.03535  0.03725  0.03494  0.03419
               0.00000  0.00000  0.00000  0.00000
filter offset: -2.08221 -2.08119 -2.08258 -2.08288
               0.000000 0.000000 0.000000 0.000000
delay 0.03419, dispersion 0.00050
offset -2.082881

11 Aug 10:55:39 ntpdate[24036]: step time server 129.132.2.21 offset
-2.082881 sec

What is strange is that only 10 minutes ago, the offset was 0.02
seconds... In the meantime I was disconnected from the server and I had
to reconnect.) For a while I also got the message "no server suitable
for synchronization found" from 129.132.2.21, like just now:

# ntpdate -d 129.132.2.21
11 Aug 11:04:35 ntpdate[26125]: ntpdate version 3.5f: Fri Dec 10
18:32:44 GMT 1999 PHNE_19711
transmit(129.132.2.21)
receive(129.132.2.21)
transmit(129.132.2.21)
receive(129.132.2.21)
transmit(129.132.2.21)
receive(129.132.2.21)
transmit(129.132.2.21)
receive(129.132.2.21)
transmit(129.132.2.21)
server 129.132.2.21, port 123
stratum 16, precision -18, leap 00, trust 000
refid [83.84.69.80], delay 0.03444, dispersion 0.00031
transmitted 4, in filter 4
reference time:      c6a58f28.f605681e  Thu, Aug 11 2005 10:47:36.961
originate timestamp: c6a59323.fb280f12  Thu, Aug 11 2005 11:04:35.981
transmit timestamp:  c6a59324.05e53000  Thu, Aug 11 2005 11:04:36.023
filter delay:  0.03520  0.03461  0.03444  0.03746
               0.00000  0.00000  0.00000  0.00000
filter offset: -0.05084 -0.05105 -0.05113 -0.04967
               0.000000 0.000000 0.000000 0.000000
delay 0.03444, dispersion 0.00031
offset -0.051134

11 Aug 11:04:36 ntpdate[26125]: no server suitable for synchronization
found

Why has the offset gone to -0.051134?

Can it be that the "stratum=13" appears whenever I loose the connection
completely to 129.132.2.21? As the syslog showed, the server is
normally stratum=1, but sometimes it jumps to stratus=13.

Thanks in advance for any help!

Best regards,

Frank Olsen

P.S. You guessed it, I'm no NTP expert...




More information about the questions mailing list