[ntp:questions] Re: Windows - Seven Days Later

David Woolley david at djwhome.demon.co.uk
Fri Oct 8 22:01:06 UTC 2004


In article <5Ih9d.8501$Sl2.3253 at trnddc09>,
Jerry Baker <jerry at novalid.invalid> wrote:

> I have made several postings about how NTP is having a real problem 
> remaining stable. I now have a little more than a weeks worth of data 

I strongly suspect your problem is the same as the problem mentioned
in this article:

http://www.ntbugtraq.com/default.asp?pid=36&sid=1&A2=ind0410&L=ntbugtraq&F=P
&S=&P=3654

Note that this has no mention of any time synchronization software,
although it does have both Windows NT family and SETI at Home in common
with you.  I doubt that S at H is relevant except as an almost entirely
CPU bound, low priority, process.  That leaves Windows!

I am pretty sure (but haven't investigated in detail, and no longer am
allowed to install software on the NT family machines that I can use)
that Windows delays a lot of rescheduling until the next clock tick.
If either the ethernet device driver or ntpd suffers from this effect,
the time stamps on the receipt, and/or transmissions of the packets will
be out by up to one clock tick, and round trips will be liable to
jump backwards and forwards a tick.

> graphed with MRTG available. Perhaps I am expecting too much from NTP, 

You're expecting too much from Windows.

> but it looks to me like there is something critically wrong with NTP's 

You seem to be have been taking a very hostile attitude to ntpd all along,
but you have been continually changing the goalposts and dribbling in
critical pieces of information.  What, for example, happened to your
original 2.6s step problem?

Looking at your MRTG graphs, you are not getting uni-directional clock
steps, but rather paired forward and backward steps, which are slightly
confused because the clock never appears stable enough for ntpd to
establish a large poll interval and long time constant, so ntpd has
partially corrected the step in one direction, before the counter step
(note, as I pointed out before, the phase errors are the cause of the
frequency changes, not the other way round).  ntpd is doing its best
with low quality time data that keeps stepping backwards and forwards
by what looks like a clock tick.

> operation on the Windows platform (SNTP clients with no stepping keep 
> better time).

How are you instrumenting the SNTP client?  In any case, most SNTP clients
use relatively long poll intervals, which will tend to low pass filter
out the forwards and backwards steps in the time measurment.  You may be
able to make ntpd filter them out by using a longer minpoll, but I can't
guarantee that that won't result in instability or a failure of the loop
to initally capture.

Of course, if you really think that ntpd is broken on NT, the source is
published and you can debug it yourself.  The NT specific code ought to
be fairly easy to find.

> Take a look at http://jerbaker.dhs.org/ntp/

As I said, it looks as though there is an uncertainty of about one clock
tick in the basic measurements and ntpd is interpreting this as a very
unstable clock and trying to be responsive in tracking the time.  A
scheduling problem is completely consistent with runs in one state,
followed by short runs in another state.

Note that, what you are graphing is the difference between the local
clock and the measurement of the time from the upstream server.  The
local clock itself is moving much more smoothly, for any application
that interpolates between clock ticks; normal applications, will, as
always, see time with a resolution of one clock tick, so will have
something like a 20ms(?) sawtooth imposed on the accurate clock, but
that's purely down to Windows.



More information about the questions mailing list