[ntp:questions] What level of timesynch error is typical on Win XP?

Joseph Gwinn joegwinn at comcast.net
Thu Oct 21 12:59:53 UTC 2010


In article <i9ooo9$9ik$1 at news.eternal-september.org>,
 "David J Taylor" <david-taylor at blueyonder.co.uk.invalid> wrote:

> "Joseph Gwinn" <joegwinn at comcast.net> wrote in message 
> news:joegwinn-DA4B7B.23340420102010 at news.giganews.com...
> > In article <i9mqek$trl$1 at news.eternal-september.org>,
> > "David J Taylor" <david-taylor at blueyonder.co.uk.invalid> wrote:
> >
> >> > I have a small network of Windows XP (64 bit) running simulations, 
> >> > with
> >> > NTPv4 running on all the boxes and using a GPS-based timeserver on 
> >> > the
> >> > company network.  The ping time to the server is 2 milliseconds from 
> >> > my
> >> > desk, but I'm seeing random time errors of order plus/minus 5 to 10
> >> > milliseconds, based on loopstats data.
> >> >
> >> > This level of timesynch error is OK for the simulation, but still 
> >> > that's
> >> > a lot of error.  I get far better on big UNIX boxes.
> >> >
> >> > The question is if this level of error is reasonable, given the 
> >> > setup.
> >> > I know that timekeeping under Windows is not optimum, but cannot 
> >> > change
> >> > the OS, so the question is if I have gotten things as good as they 
> >> > can
> >> > be, or should I dig deeper.  One thing that comes to mind is to raise
> >> > the priority of the NTP daemon to exceed that of the simulation
> >> > software.
> >> >
> >> > Thanks in advance,
> >> >
> >> > Joe Gwinn
> >>
> >> Joe,
> >>
> >> This is the performance I see:
> >>
> >>   http://www.satsignal.eu/mrtg/performance_ntp.php
> >>
> >> The XP systems are:
> >>
> >>   Feenix: GPS-synched
> >>   Narvik: LAN-synced to Pixie (FreeBSD with GPS source)
> >
> > These are all over the place.  Both hardware and OS seem to matter, by a
> > lot.
> 
> Hardly "all over the place"!  Feenix is well within a milliseconds, and 
> Narvik just within a millisecond, and programs on that OS can only read 
> the system time with ~16ms precision.

By "all over the place" I mean that while some combinations are very 
good, yielding peak offsets well less than a millisecond, some 
combinations yield peak offsets of 25 milliseconds.  In my application, 
only peak offsets matter.


> > I can't add a GPS source, and I can't really control temperature.
> 
> So you need to keep the polling interval short.

We tried 16 seconds, with no variation allowed, and it didn't make much 
difference.  Currently, NTP is being allowed to choose its own polling 
period.  I don't recall what periods it chose, but I'll look.

What other periods would you suggest, and why?


> > I don't think that iburst is the issue, because the randomness persists
> > for at least a week, long after the iburst transients will have died
> > down.
> 
> I never said iburst was an issue, just that the systems will need to be on 
> for several hours before best accuracy is achieved.  It's a pity that NTP 
> doesn't have a faster initial convergence.
> 
> > My experience is the same. for average behaviour.  But for use in
> > realtime, running the daemon at high realtime priority greatly reduces
> > the tails of the probability distribution of response times and/or clock
> > offsets.
> >
> > Joe Gwinn
> 
> Yes, if the CPU loading is heavy I can quite believe that.

That's the usual cause.  My usual solution is to ensure that the NTP 
daemon has a high realtime priority that well exceeds that of the 
realtime application code.  NTP can be run at the highest realtime 
priority available without difficulty on every system I have tried this 
on.  

Another, more subtle cause, is Network File System (NFS) access being 
used to read or write the local disk from afar.  This completely 
distracts the local OS kernel, at an implied priority that exceeds all 
processes and threads, including NTP running at the highest realtime 
priority.  And yet there may be no record of the activity in syslog. Nor 
is it clear that NFS activity is always counted in the I/O read and 
write statistics kept by the kernel. Diagnosis may require network 
tools, unless one can figure out where the NFS access must be coming 
from and stop it at the source.  


I should explain what I mean by the term "realtime priority".  There are 
two related but independent things going on here, a numerical priority 
and a scheduling policy.

A realtime scheduling policy is typically "winner take all", where the 
process (and/or thread) having the highest priority can use as much of 
the processor as desired, even if all other processes and threads are 
squeezed out completely.  In other words, realtime scheduling policies 
are completely unfair.  It is the human system designers' responsibility 
to ensure that there is enough computer that nothing critical is unduly 
stalled.

A non-realtime scheduling policy attempts some notion of fairness, where 
all processes and threads make progress at an average rate that is 
determined by their respective numerical priorities.  In such a scheme, 
nobody is completely squeezed out, and no direct human intervention is 
required to ensure this outcome.


Joe Gwinn




More information about the questions mailing list