[ntp:questions] Re: Unexpected ntpd behavior

David L. Mills mills at udel.edu
Fri Mar 11 02:38:13 UTC 2005


Guys,

The {0-3} in the last digit defines one of four possible instances of 
the same driver. We often use multiple radios and the same driver, but 
with different last digits. There are a few drivers, but only a few, 
that support only one instance.

Dave

Richard B. Gilbert wrote:

> Pete Buelow wrote:
> 
>> Richard B. Gilbert wrote:
>>
>>  
>>
>>> Pete Buelow wrote:
>>>
>>>   
>>>
>>>> Some quick background. Trying to get ntpd running on some IA64 
>>>> hardware in
>>>> a pretty simple environment. Two machines in a pair relationship, the
>>>> first machine in the pairing talks to a known good NTP server, the 
>>>> other
>>>> talks to it's paired buddy. OS is Debain Sarge stable, ntp is 
>>>> 4.1.0-8. Ntp
>>>> is started with -n -c /path/to/conf -x. Conf is simple, and is below.
>>>>
>>>> server 11.0.0.1 prefer
>>>> server 127.127.1.1
>>>> fudge 127.127.1.1 stratum 14 refid LCL
>>>>
>>>>
>>>>     
>>>
>>> The above two lines are in error!   The local clock should be
>>> 127.127.1.0!!!!!
>>>
>>>   
>>
>> Why? I can't find any documentation that states exactly why I should 
>> pick 0
>> over 1 when most of the reference configs, and in fact, the config on the
>> ntp doc's page
>>
>> http://www.eecis.udel.edu/~mills/ntp/html/drivers/driver1.html
>>
>> seems to indicate that 1 is the value to use. Not sure what exactly the
>> difference is in picking 0 over 1 in this case.
>>
>>  
>>
>>>> driftfile /etc/ntp.drift
>>>> pidfile /etc/ntp.pid
>>>> disable stats
>>>> authenticate no
>>>>
>>>> Problem is, if time is slow compared to 11.0.0.1 (which works just 
>>>> fine,
>>>> it's a timeserver for several hundred lab machines), it will catch up
>>>> quite rapidly (much faster than the 2000s/s rate), and run past. If the
>>>> time is ahead of the server, it will just continue ahead. I found a 
>>>> post
>>>> below which states that it should then turn around eventually, and head
>>>> the other direction, bouncing like a bungee, but I've never run the 
>>>> test
>>>> that long. I have no idea why this behavior is happening. And it is the
>>>> same behavior on both machines.
>>>>
>>>> A sample ntpq -p output. Clock was set 6 and a half seconds behind
>>>> 11.0.0.1.
>>>>
>>>> Node2# ntpq -p
>>>>    remote           refid      st t when poll reach   delay   offset
>>>> jitter
>>>> ============================================================================== 
>>>>
>>>> *11.0.0.1        192.168.31.253   4 u   55   64  377    0.308  6418.55
>>>> 1.565
>>>> LOCAL(1)        LOCAL(1)        14 l   21   64  377    0.000    0.000
>>>> 0.004
>>>>
>>>> Two notes of interest based on other posts I've read
>>>> 1. Our tick rate is 1ms instead of 10ms.
>>>> 2. On almost all of the test machines, the drift file is populated with
>>>> the value 500. On one it's ~450. According to another poster, that 
>>>> could
>>>> be the source of some issues.
>>>>
>>>> Thoughts? Ideas? I'm assuming right now that it's either a config or 
>>>> a HW
>>>> issue. I'm running a test now with this config and command line 
>>>> options,
>>>> but am adding "disable kernel" to the config file. Wondering if that 
>>>> will
>>>> change the behavior.
>>>>
>>>> Thanks in advance if anyone has any help to offer at all.
>>>>
>>>>
>>>>
>>>>     
>>>
>>> If almost all of your drift files are populated with 500, something is
>>> very wrong!!   500 is the limit for correctable frequency errors!  If
>>> your clock frequencies are all in error by 500ppm or more, I would
>>> suspect the clock you are trying to synchronize with.   If I had  a
>>> hundred machines synchronized with a known good clock, I would expect
>>> ninety percent or more of them to have drift values  in the range from
>>> -200 to + 200.   Checking the machines running ntp in my home I find:
>>> two Sun Ultra 10 workstations running Solaris 8 and Solaris 9 have
>>> 6.400 and  -3.172 respectively.   A DEC Alphastation 200
>>> running VMS V7.2-1 has  35.488 while a Compaq Deskpro EN running RedHat
>>> has -4.908.   A very small sample, but indicative of what is "normal".
>>>
>>> I also note that the machine you are trying to synchronize with is at
>>> stratum 4 which is pretty near the bottom of the food chain!!  While
>>> stratum can range from 1 to 15, I'd consider serving time from any
>>> stratum higher than 3 as a little bit odd.
>>>
>>> Stratum 1 servers get their time directly from a hardware reference
>>> clock traceable to NIST or some other national standards organization.
>>> Stratum 2 servers get their time from stratum 1.   Small organizations
>>> would operate stratum 3 servers and have their leaf nodes at stratum
>>> 4.    Larger organizations would operate stratum 2 or stratum one
>>> servers with leaf nodes at stratum 2 or 3.
>>>   
>>
>>
>>  
>>
> The 127.127.1.0 is not a valid IP address.  The 127.127 part says it's a 
> reference clock.  The 1 is the driver number for the local clock.  The 0 
> is the unit number.  I don't know what "driver 0" is likely to do, if 
> there is such a thing.  I'm a little surprised that you aren't getting 
> an error message somewhere.  Or maybe you are and just haven't noticed it?



More information about the questions mailing list