[ntp:questions] Re: Unexpected ntpd behavior

Pete Buelow nospam at putzin.net
Thu Mar 10 22:01:46 UTC 2005


Richard B. Gilbert wrote:

> Pete Buelow wrote:
> 
>>Richard B. Gilbert wrote:
>>
>>  
>>
>>>Pete Buelow wrote:
>>>
>>>    
>>>
>>>>Some quick background. Trying to get ntpd running on some IA64 hardware
>>>>in a pretty simple environment. Two machines in a pair relationship, the
>>>>first machine in the pairing talks to a known good NTP server, the other
>>>>talks to it's paired buddy. OS is Debain Sarge stable, ntp is 4.1.0-8.
>>>>Ntp is started with -n -c /path/to/conf -x. Conf is simple, and is
>>>>below.
>>>>
>>>>server 11.0.0.1 prefer
>>>>server 127.127.1.1
>>>>fudge 127.127.1.1 stratum 14 refid LCL
>>>> 
>>>>
>>>>      
>>>>
>>>The above two lines are in error!   The local clock should be
>>>127.127.1.0!!!!!
>>>
>>>    
>>>
>>Why? I can't find any documentation that states exactly why I should pick
>>0 over 1 when most of the reference configs, and in fact, the config on
>>the ntp doc's page
>>
>>http://www.eecis.udel.edu/~mills/ntp/html/drivers/driver1.html
>>
>>seems to indicate that 1 is the value to use. Not sure what exactly the
>>difference is in picking 0 over 1 in this case.
>> 
>>  
>>
>>>>driftfile /etc/ntp.drift
>>>>pidfile /etc/ntp.pid
>>>>disable stats
>>>>authenticate no
>>>>
>>>>Problem is, if time is slow compared to 11.0.0.1 (which works just fine,
>>>>it's a timeserver for several hundred lab machines), it will catch up
>>>>quite rapidly (much faster than the 2000s/s rate), and run past. If the
>>>>time is ahead of the server, it will just continue ahead. I found a post
>>>>below which states that it should then turn around eventually, and head
>>>>the other direction, bouncing like a bungee, but I've never run the test
>>>>that long. I have no idea why this behavior is happening. And it is the
>>>>same behavior on both machines.
>>>>
>>>>A sample ntpq -p output. Clock was set 6 and a half seconds behind
>>>>11.0.0.1.
>>>>
>>>>Node2# ntpq -p
>>>>    remote           refid      st t when poll reach   delay   offset
>>>>jitter
>>>>==============================================================================
>>>>*11.0.0.1        192.168.31.253   4 u   55   64  377    0.308  6418.55
>>>>1.565
>>>>LOCAL(1)        LOCAL(1)        14 l   21   64  377    0.000    0.000
>>>>0.004
>>>>
>>>>Two notes of interest based on other posts I've read
>>>>1. Our tick rate is 1ms instead of 10ms.
>>>>2. On almost all of the test machines, the drift file is populated with
>>>>the value 500. On one it's ~450. According to another poster, that could
>>>>be the source of some issues.
>>>>
>>>>Thoughts? Ideas? I'm assuming right now that it's either a config or a
>>>>HW issue. I'm running a test now with this config and command line
>>>>options, but am adding "disable kernel" to the config file. Wondering if
>>>>that will change the behavior.
>>>>
>>>>Thanks in advance if anyone has any help to offer at all.
>>>>
>>>> 
>>>>
>>>>      
>>>>
>>>If almost all of your drift files are populated with 500, something is
>>>very wrong!!   500 is the limit for correctable frequency errors!  If
>>>your clock frequencies are all in error by 500ppm or more, I would
>>>suspect the clock you are trying to synchronize with.   If I had  a
>>>hundred machines synchronized with a known good clock, I would expect
>>>ninety percent or more of them to have drift values  in the range from
>>>-200 to + 200.   Checking the machines running ntp in my home I find:
>>>two Sun Ultra 10 workstations running Solaris 8 and Solaris 9 have
>>>6.400 and  -3.172 respectively.   A DEC Alphastation 200
>>>running VMS V7.2-1 has  35.488 while a Compaq Deskpro EN running RedHat
>>>has -4.908.   A very small sample, but indicative of what is "normal".
>>>
>>>I also note that the machine you are trying to synchronize with is at
>>>stratum 4 which is pretty near the bottom of the food chain!!  While
>>>stratum can range from 1 to 15, I'd consider serving time from any
>>>stratum higher than 3 as a little bit odd.
>>>
>>>Stratum 1 servers get their time directly from a hardware reference
>>>clock traceable to NIST or some other national standards organization.
>>>Stratum 2 servers get their time from stratum 1.   Small organizations
>>>would operate stratum 3 servers and have their leaf nodes at stratum
>>>4.    Larger organizations would operate stratum 2 or stratum one
>>>servers with leaf nodes at stratum 2 or 3.
>>>    
>>>
>>
>>  
>>
> The 127.127.1.0 is not a valid IP address.  The 127.127 part says it's a
> reference clock.  The 1 is the driver number for the local clock.  The 0
> is the unit number.  I don't know what "driver 0" is likely to do, if
> there is such a thing.  I'm a little surprised that you aren't getting
> an error message somewhere.  Or maybe you are and just haven't noticed it?

Nope, no errors. I got the ref clock bits in the doc's, but most of the conf
files I see use .1 (fedora, ntp.org, debian). There are no errors, and
switching to .0 doesn't change the behavior at all. Even with -dd, there is
no different output. 
-- 
Pete Buelow
replace nospam with putzin if you feel the urge to reply to me directly.



More information about the questions mailing list