[ntp:questions] Re: Solairs 8 xntpd client oscillates

Brian Utterback brian.utterback at sun.removeme.com
Mon May 8 13:47:46 UTC 2006


Joachim Schrod wrote:
> Hello,
> 
> I have a Solaris 8 system with Sun's xntpd (version 3). It is a SunBlade.
> 
> I try to run that system as a xntpd client against our Linux timeserver. 
> That timeserver is running ntpd 4.1.1 (that's a SUSE 9.0 system) and is 
> running stable, time doesn't wander or anything. The systems are in one 
> LAN, no firewall in between.
> 
> Now, the Solaris client regularly drifts off from the timeserver until 
> it reaches an offset of ca. 1500 and then resets itself, starting the 
> whole game anew. From the ntp log:
> 
>  8 May 14:12:15 xntpd[1167]: synchronized to 192.168.129.1, stratum=2
>  8 May 14:28:16 xntpd[1167]: time reset (step) 1.446707 s
>  8 May 14:28:16 xntpd[1167]: synchronisation lost
>  8 May 14:28:16 xntpd[1167]: system event 'event_clock_reset' (0x05) 
> status 'sync_alarm, sync_unspec, 8 events, event_peer/strat_chg' (0xc084)
>  8 May 14:28:16 xntpd[1167]: system event 'event_sync_chg' (0x03) status 
> 'sync_alarm, sync_unspec, 9 events, event_clock_reset' (0xc095)
>  8 May 14:28:16 xntpd[1167]: system event 'event_peer/strat_chg' (0x04) 
> status 'sync_alarm, sync_unspec, 10 events, event_sync_chg' (0xc0a3)
>  8 May 14:29:19 xntpd[1167]: peer LOCAL(0) event 'event_reach' (0x84) 
> status 'unreach, conf, 2 events, event_reach' (0x8024)
>  8 May 14:29:20 xntpd[1167]: peer 192.168.129.1 event 'event_reach' 
> (0x84) status 'reach, conf, 2 events, event_reach' (0x9024)
>  8 May 14:33:35 xntpd[1167]: synchronized to LOCAL(0), stratum=10
>  8 May 14:33:35 xntpd[1167]: system event 'event_sync_chg' (0x03) status 
> 'leap_none, sync_local_proto, 12 events, event_peer/strat_chg' (0x5c4)
>  8 May 14:33:35 xntpd[1167]: system event 'event_peer/strat_chg' (0x04) 
> status 'leap_none, sync_local_proto, 13 events, event_sync_chg' (0x5d3)
>  8 May 14:33:36 xntpd[1167]: synchronized to 192.168.129.1, stratum=2
>  8 May 14:28:16 xntpd[1167]: time reset (step) 1.446707 s
>  8 May 14:28:16 xntpd[1167]: synchronisation lost
>  8 May 14:28:16 xntpd[1167]: system event 'event_clock_reset' (0x05) 
> status 'sync_alarm, sync_unspec, 8 events, event_peer/strat_chg' (0xc084)
> 
> and so on. This happens roughly every 20 minutes and I cannot discover 
> the reason or any method to avoid that problem.
> 
> My ntp.conf has as content (plus logging, no restrict clauses):
> 
> server 127.127.1.0              # local clock (LCL)
> fudge  127.127.1.0 stratum 10   # LCL is unsynchronized
> server 192.168.129.1            # IP address of server
> driftfile /etc/ntp.drift # path for drift file
> 
> I also added a "disable pll" clause, as per recommendation of a Sun 
> Blueprint that I found via the FAQ. With that clause, xntpd does not 
> synchronize the time either; the behaviour is the same. With both 
> configurations I let it ran for more than one day, so there should have 
> been enough time for synchronization.
> 
> Some ntpq output:
> 
> ntpq> peer
>      remote           refid      st t when poll reach   delay   
> offset    disp
> ============================================================================== 
> 
>  LOCAL(0)        LOCAL(0)        10 l   14   64  377     0.00    0.000   
> 10.01
> *lion.npc.de     ptbtime2.ptb.de  2 u   13   64  377     0.47  985.966  
> 146.59
> 
> ntpq> readvar
> status=06f4 leap_none, sync_ntp, 15 events, event_peer/strat_chg
> system="SunOS", leap=00, stratum=3, rootdelay=52.40,
> rootdispersion=1161.74, peer=46165, refid=lion.npc.de,
> reftime=c809bb20.7437a000  Mon, May  8 2006 14:42:08.453, poll=6,
> clock=c809bb2e.f4511000  Mon, May  8 2006 14:42:22.954, phase=0.000,
> freq=0.00, error=146.59
> 
> ntpq> readvar 46165
> status=9624 reach, conf, sel_sys.peer, 2 events, event_reach
> srcadr=lion.npc.de, srcport=123, dstadr=192.168.129.2, dstport=123,
> keyid=0, stratum=2, precision=-17, rootdelay=51.93,
> rootdispersion=29.19, refid=ptbtime2.ptb.de,
> reftime=c809b94e.0f1fbc5d  Mon, May  8 2006 14:34:22.059,
> delay=    0.47, offset=  985.97, dispersion=146.59, reach=377, valid=8,
> hmode=3, pmode=4, hpoll=6, ppoll=6, leap=00, flash=0x0<OK>,
> org=c809bb21.70902de0  Mon, May  8 2006 14:42:09.439,
> rec=c809bb20.7437a000  Mon, May  8 2006 14:42:08.453,
> xmt=c809bb20.74154000  Mon, May  8 2006 14:42:08.453,
> filtdelay=    0.47    0.46    0.46    0.44    0.46    0.49    0.44    0.40,
> filtoffset= 985.96  910.00  834.03  758.07  682.13  606.16  530.20  454.22,
> filterror=    0.02    0.99    1.97    2.94    3.92    4.90    5.87    6.85
> 
> ntpq> version
> ntpq 3-5.93e Mon Sep 20 15:45:42 PDT 1999 (1)
> 
> I traced the network traffic between the Sun and the timeserver. Both 
> requests and answers are looking good and show no problem. (The server 
> answers with version 3 packets, so I assumed that it's not a v3 vs. v4 
> problem. Or may it be?)
> 
> I have installed the Sun patch 109667-07, which is the latest patch for 
> xntpd, AFAIK.
> 
> Can anybody help me here? I don't know any more where I should look or 
> what I should change to make the SunBlade synchronize its time.
> Thanks in advance for any answer,
> 
>     Joachim
> 

Well, this is pretty interesting.

I don't know what your problem is, but I have a couple of observations.

First, it seems pretty odd that the second reset in the log is for
exactly the same offset and seems to have the same timestamp as the
first reset. It appears that these lines are really replays of the
first set. Since immediately after the actual reset, we can assume that
offset between the system and the server is 0, and the measured offset
between them was 1.446707 seconds just before, it doesn't seem 
reasonable that the clock should jump back 5 minutes, whether or not
the timestamp is before or after the reset.

Further, offsets shown in readvar output show severe drift, but this
could be due to NTP's attempt to correct the clock. Since this
output is sorted, I can't really tell if the clock is diverging
or converging.

One possibility is that you are a victim of a misbehaving hardware
TOD clock. Another is that you have a process that is syncing the
clock periodically to a different server. Another is that your system
clock drifts radically.

Here is what I suggest. First lose the LOCAL refclock. It does nothing
for you. Second, unless you are using the slewalways option, lose the
"disable pll". Since you don't show either in your config file, I can't
tell which you need.

Third, configure the stats file. That is the only way you are going to
get enough data to be able to even guess.

Fourth, stop xntpd, delete the drift file and reboot. Do not just
restart xntpd. You may have a bogus drift value in the kernel and
in the drift file.

Try these out, wait a day or two and check again. If you still
have problems, repost and send a link to the stats files.

-- 
blu

Rose are #FF0000, Violets are #0000FF. All my base are belong to you.
----------------------------------------------------------------------
Brian Utterback - OP/N1 RPE, Sun Microsystems, Inc.
Ph:877-259-7345, Em:brian.utterback-at-ess-you-enn-dot-kom




More information about the questions mailing list