[ntp:questions] Time steping regardless of "-x" slew only option
smith at cag.zko.hp.com
Sat Jun 23 22:45:44 UTC 2007
> I am using AIX 5.4 and NTP version 3.4 from what I can tell, which
> came with the operating system. ntpq, "v" option returns 3.4y.
> This is not a startup condition. xntpd has been running for a while
> before the steps.
> I realize that I have other issues causing the need for steps. I am
> investigating this problem as well. I only have one time server, so I
> have network issues or something that is causing the need for the
> steps. Underneath the one server, we have several of our processors
> acting as servers to the rest of our system. If one of our servers is
> having interrupt delays from disk activity or some other issue,
> perhaps that is causing the dicrepency. Does anyone know the best way
> to debug such a situation?
> But I still need to understand why NTP is stepping when the
> documentation I have says that it should not with the "-x" option. We
> have a distributed processing system with many processors. It is
> imperative that time does not step on any of our processors or our
> software will detect heartbeat problems. That is what is currently
> happening, so I know that real steps are occuring and not just steps
> in the ntp.log file.
If you have only one server and it is unstable, that would cause stepping
on its clients. As Richard suggested, the output of "ntpq -p" would be
helpful. In this case, the output of "ntpq -p [your server]" would also
be helpful. You can work back through the chain of servers to find out
more about where things are going south. Another helpful data point would
be "(x)ntpdc -c loopinfo" showing the frequency/drift rate. If it is
consistently large, it indicates a problem.
"-x" can often make things worse, causing large corrections to be necessary
that might not otherwise be and it should never be used until a system has
first been run without it for several days to stabilize on a characteristic
Under normal operation, (x)ntpd will only step if the offset is larger than
128 msec. A well-configured, well behaved NTP network should never run into
that. However, if the system is far off the "correct" time when it starts
"-x" can prevent it from getting to the correct time, causing repeated attempts
to step. For this reason, most OS's run ntpdate at boot to initially set the time
to within a few milliseconds. If that part of your boot is not configured
correctly, that could cause attempts to step some time after xntpd starts running.
Another common cause is if you have servers configured, including your local
undisciplined clock, that do not agree on the time (or if your one or more of
your servers itself have that problem).
If, as you say, this occurs after things have been stable for some time, you
may have a problem with suddenly increased latency in one direction
that causes xntpd to calculate an offset larger than actually exists. xntpd
adds 1/2 the RTT to the time reported by the server to determine the time
that should be applied to the client. If the network delay is longer in
one direction than in the other, this can cause a calculated offset different
from the real difference between the 2 clocks of 1/2 the difference in the
delays in the 2 directions. Over long distances or on heavily loaded
systems or networks, this can be significant.
More information about the questions