[ntp:questions] Re: Unexpected ntpd behavior

Richard B. Gilbert rgilbert88 at comcast.net
Wed Mar 9 23:23:09 UTC 2005


Pete Buelow wrote:

>Richard B. Gilbert wrote:
>
>  
>
>>Pete Buelow wrote:
>>
>>    
>>
>>>Some quick background. Trying to get ntpd running on some IA64 hardware in
>>>a pretty simple environment. Two machines in a pair relationship, the
>>>first machine in the pairing talks to a known good NTP server, the other
>>>talks to it's paired buddy. OS is Debain Sarge stable, ntp is 4.1.0-8. Ntp
>>>is started with -n -c /path/to/conf -x. Conf is simple, and is below.
>>>
>>>server 11.0.0.1 prefer
>>>server 127.127.1.1
>>>fudge 127.127.1.1 stratum 14 refid LCL
>>> 
>>>
>>>      
>>>
>>The above two lines are in error!   The local clock should be
>>127.127.1.0!!!!!
>>
>>    
>>
>Why? I can't find any documentation that states exactly why I should pick 0
>over 1 when most of the reference configs, and in fact, the config on the
>ntp doc's page
>
>http://www.eecis.udel.edu/~mills/ntp/html/drivers/driver1.html
>
>seems to indicate that 1 is the value to use. Not sure what exactly the
>difference is in picking 0 over 1 in this case.
> 
>  
>
>>>driftfile /etc/ntp.drift
>>>pidfile /etc/ntp.pid
>>>disable stats
>>>authenticate no
>>>
>>>Problem is, if time is slow compared to 11.0.0.1 (which works just fine,
>>>it's a timeserver for several hundred lab machines), it will catch up
>>>quite rapidly (much faster than the 2000s/s rate), and run past. If the
>>>time is ahead of the server, it will just continue ahead. I found a post
>>>below which states that it should then turn around eventually, and head
>>>the other direction, bouncing like a bungee, but I've never run the test
>>>that long. I have no idea why this behavior is happening. And it is the
>>>same behavior on both machines.
>>>
>>>A sample ntpq -p output. Clock was set 6 and a half seconds behind
>>>11.0.0.1.
>>>
>>>Node2# ntpq -p
>>>    remote           refid      st t when poll reach   delay   offset
>>>jitter
>>>==============================================================================
>>>*11.0.0.1        192.168.31.253   4 u   55   64  377    0.308  6418.55
>>>1.565
>>>LOCAL(1)        LOCAL(1)        14 l   21   64  377    0.000    0.000
>>>0.004
>>>
>>>Two notes of interest based on other posts I've read
>>>1. Our tick rate is 1ms instead of 10ms.
>>>2. On almost all of the test machines, the drift file is populated with
>>>the value 500. On one it's ~450. According to another poster, that could
>>>be the source of some issues.
>>>
>>>Thoughts? Ideas? I'm assuming right now that it's either a config or a HW
>>>issue. I'm running a test now with this config and command line options,
>>>but am adding "disable kernel" to the config file. Wondering if that will
>>>change the behavior.
>>>
>>>Thanks in advance if anyone has any help to offer at all.
>>>
>>> 
>>>
>>>      
>>>
>>If almost all of your drift files are populated with 500, something is
>>very wrong!!   500 is the limit for correctable frequency errors!  If
>>your clock frequencies are all in error by 500ppm or more, I would
>>suspect the clock you are trying to synchronize with.   If I had  a
>>hundred machines synchronized with a known good clock, I would expect
>>ninety percent or more of them to have drift values  in the range from
>>-200 to + 200.   Checking the machines running ntp in my home I find:
>>two Sun Ultra 10 workstations running Solaris 8 and Solaris 9 have
>>6.400 and  -3.172 respectively.   A DEC Alphastation 200
>>running VMS V7.2-1 has  35.488 while a Compaq Deskpro EN running RedHat
>>has -4.908.   A very small sample, but indicative of what is "normal".
>>
>>I also note that the machine you are trying to synchronize with is at
>>stratum 4 which is pretty near the bottom of the food chain!!  While
>>stratum can range from 1 to 15, I'd consider serving time from any
>>stratum higher than 3 as a little bit odd.
>>
>>Stratum 1 servers get their time directly from a hardware reference
>>clock traceable to NIST or some other national standards organization.
>>Stratum 2 servers get their time from stratum 1.   Small organizations
>>would operate stratum 3 servers and have their leaf nodes at stratum
>>4.    Larger organizations would operate stratum 2 or stratum one
>>servers with leaf nodes at stratum 2 or 3.
>>    
>>
>
>  
>
The 127.127.1.0 is not a valid IP address.  The 127.127 part says it's a 
reference clock.  The 1 is the driver number for the local clock.  The 0 
is the unit number.  I don't know what "driver 0" is likely to do, if 
there is such a thing.  I'm a little surprised that you aren't getting 
an error message somewhere.  Or maybe you are and just haven't noticed it?



More information about the questions mailing list