[ntp:questions] Re: Unexpected ntpd behavior
Richard B. Gilbert
rgilbert88 at comcast.net
Wed Mar 9 23:23:09 UTC 2005
Pete Buelow wrote:
>Richard B. Gilbert wrote:
>>Pete Buelow wrote:
>>>Some quick background. Trying to get ntpd running on some IA64 hardware in
>>>a pretty simple environment. Two machines in a pair relationship, the
>>>first machine in the pairing talks to a known good NTP server, the other
>>>talks to it's paired buddy. OS is Debain Sarge stable, ntp is 4.1.0-8. Ntp
>>>is started with -n -c /path/to/conf -x. Conf is simple, and is below.
>>>server 126.96.36.199 prefer
>>>fudge 127.127.1.1 stratum 14 refid LCL
>>The above two lines are in error! The local clock should be
>Why? I can't find any documentation that states exactly why I should pick 0
>over 1 when most of the reference configs, and in fact, the config on the
>ntp doc's page
>seems to indicate that 1 is the value to use. Not sure what exactly the
>difference is in picking 0 over 1 in this case.
>>>Problem is, if time is slow compared to 188.8.131.52 (which works just fine,
>>>it's a timeserver for several hundred lab machines), it will catch up
>>>quite rapidly (much faster than the 2000s/s rate), and run past. If the
>>>time is ahead of the server, it will just continue ahead. I found a post
>>>below which states that it should then turn around eventually, and head
>>>the other direction, bouncing like a bungee, but I've never run the test
>>>that long. I have no idea why this behavior is happening. And it is the
>>>same behavior on both machines.
>>>A sample ntpq -p output. Clock was set 6 and a half seconds behind
>>>Node2# ntpq -p
>>> remote refid st t when poll reach delay offset
>>>*184.108.40.206 192.168.31.253 4 u 55 64 377 0.308 6418.55
>>>LOCAL(1) LOCAL(1) 14 l 21 64 377 0.000 0.000
>>>Two notes of interest based on other posts I've read
>>>1. Our tick rate is 1ms instead of 10ms.
>>>2. On almost all of the test machines, the drift file is populated with
>>>the value 500. On one it's ~450. According to another poster, that could
>>>be the source of some issues.
>>>Thoughts? Ideas? I'm assuming right now that it's either a config or a HW
>>>issue. I'm running a test now with this config and command line options,
>>>but am adding "disable kernel" to the config file. Wondering if that will
>>>change the behavior.
>>>Thanks in advance if anyone has any help to offer at all.
>>If almost all of your drift files are populated with 500, something is
>>very wrong!! 500 is the limit for correctable frequency errors! If
>>your clock frequencies are all in error by 500ppm or more, I would
>>suspect the clock you are trying to synchronize with. If I had a
>>hundred machines synchronized with a known good clock, I would expect
>>ninety percent or more of them to have drift values in the range from
>>-200 to + 200. Checking the machines running ntp in my home I find:
>>two Sun Ultra 10 workstations running Solaris 8 and Solaris 9 have
>>6.400 and -3.172 respectively. A DEC Alphastation 200
>>running VMS V7.2-1 has 35.488 while a Compaq Deskpro EN running RedHat
>>has -4.908. A very small sample, but indicative of what is "normal".
>>I also note that the machine you are trying to synchronize with is at
>>stratum 4 which is pretty near the bottom of the food chain!! While
>>stratum can range from 1 to 15, I'd consider serving time from any
>>stratum higher than 3 as a little bit odd.
>>Stratum 1 servers get their time directly from a hardware reference
>>clock traceable to NIST or some other national standards organization.
>>Stratum 2 servers get their time from stratum 1. Small organizations
>>would operate stratum 3 servers and have their leaf nodes at stratum
>>4. Larger organizations would operate stratum 2 or stratum one
>>servers with leaf nodes at stratum 2 or 3.
The 127.127.1.0 is not a valid IP address. The 127.127 part says it's a
reference clock. The 1 is the driver number for the local clock. The 0
is the unit number. I don't know what "driver 0" is likely to do, if
there is such a thing. I'm a little surprised that you aren't getting
an error message somewhere. Or maybe you are and just haven't noticed it?
More information about the questions