[ntp:questions] Bug 2341 - ntpd fails to keep up with clock drift at poll>7

Martin Burnicki martin.burnicki at meinberg.de
Wed Nov 27 17:26:33 UTC 2013


Brian Inglis wrote:
 > On 2013-11-26 08:22, Martin Burnicki wrote:
 >> Brian Inglis wrote:
[...]
 > As I said above, on Windows stable, with only network servers, and
 > normal maxpoll 10, as the poll interval increases, the FLL kicks in
 > to drive the drift within PPB, and the offset  stabilizes in the low
 > us.

Yes, you said that. But on which Windows version?

Here's a short summary of possible variants, if I remember correctly:

1. Windows XP/Server 2003. System time increments in steps of about 16 
ms. With ntpd 4.2.4 the system time was always interpolated between 
clock ticks, and the performance of interpolation was depending on the 
implementation of QueryPerformanceCounter (QPC) which could be based on 
the power management timer (PMTIMER), or the TSC. If QPC was using TSC 
then interpolated times could go nuts if running on a CPU type where the 
TSCs of the different CPU cores were not synchronized, or the TSC was 
clocked down due to power management (Intel SpeedStep, AMD, 
Cool'n'Quiet, ...). SP3 for XP usually switched QPC to use the PMTIMER, 
to avoid potential TSC problems in general.

2. With Windows Vista and later (Win 7, Server 2008) the system time 
started to tick in 1 ms increments instead of 16 ms increments. The bug 
where small time corrections are ignored by the Windows kernel was not 
yet known, but experiments showed that on systems with 1 ms increments 
the time adjustment was usually smoother if the Windows system clock was 
*not* interpolated.

So starting with 4.2.6, ntpd tries to figure out if the system clock 
increments in 1 ms steps, or more coarsely. In the former case 
interpolation was disabled, but in the latter case interpolation was 
still used, basically the same way as in ntpd 4.2.4. These Windows 
versions also "knew" which CPU types have problems with their TSCs, and 
used PMTIMER or HPET instead. If these Windows version decide to use TSC 
then TSC usually works reliably. So, if ntpd figures out that QPC uses 
TSC then it reads TSC directly, which is faster than using the QPC API. 
Some of the default behavior can be altered by using some environment 
variables. This great work was done by Dave Hart.

3. So while ntpd 4.2.6 often works great on Windows XP / Server 2003, as 
well as on Windows 7 / Server 2008 we at Meinberg received a number of 
reports where the system time adjustment loop didn't settle, and ntpq -p 
reported a large offset and jitter. We could determine 2 cases where 
other drivers messed up the system time, but there were still cases were 
we were unable to find the reason. Fortunately a guy named Andrew Dixie 
came up with
https://bugs.ntp.org/show_bug.cgi?id=2328
and brought the Windows bug
http://support.microsoft.com/kb/2537623
to our attention. He also provided a patch with a workaround which was 
pulled into the current development version of ntpd. This patch has 
fixed the loop settling problems on all systems I know of where the 
system time adjustment didn't converge with an earlier version of ntpd.

 >> So my advice would have been to use minpoll 4 maxpoll 4,if
 >> this setting wouldn't affect the workaround implemented in -dev.
 >
 > Would probably get you kicked off most upstream servers eventually!

Maybe if you are using public/pool servers, but not if you are using 
your own NTP server. Take care, I'm biased! ;-)

>>> With current stable and a ref clock with prefer or low poll, and
>>> backup servers with low or no minpoll, backup servers are polled
>>> at minpoll or the same rate as the ref clock, so would never see
>>> this issue.
 >>
 >> Hm, are you really sure the polling interval for the backup server(s)
 >> depends on the polling interval of a configured refclock?
 >> I haven't noticed this, yet, but I also haven't checked this.
 >
 > After installing a refclock with minpoll & maxpoll 4, had to bump my
 > upstream minpoll to 6, or all were polled every 16s, and I figured
 > someone might notice and object!

Hm, that's strange.

I have not yet used refclocks with the Windows port of ntpd. At least on 
my Linux here system this doesn't happen with ntpd 4.2.6p5. I have a 
mixture of parse and SHM refclocks all clamped to minpoll 4 / maxpoll 4, 
and a backup server without minpoll / maxpoll configured, which stays at 
a 64 s polling interval.

I wouldn't expect the poll interval changes to be controlled by 
OS-specific code, but I could imagine that it depends on the results of 
subsequent pollings, which may yield different offset and jitter figures 
under Windows and Linux, or even with or without refclocks under Windows.

> I noticed after a router restart causing minutes of unreachability,
> network servers were temporarily polling every 1024s, then dropped
> back to 64s when they again became reachable.

I think this is expected behavior.

> However, network servers only seem to log peerstats about every 5
> minutes, giving about 288-300 samples per day, every 288-300
> seconds.

I've also observed that the stats files are updated less frequently by 
recent versions of ntpd than by earlier versions. I don't have tracked 
the changes which cause this, though.

 >> What if you don't have a refclock, only upstream servers?
 >
> Poll intervals increase up to maxpoll, depending on the server and
> link quality.

Right, and that's exactly where I have seen offset increasing, at least 
under Windows. Thus my suggestion to limit the poll interval, which also 
speeds up synchronization, which is also appreciated by most users I've 
talked to.

 > Appreciate the feedback and questions, and thanks very much for the
 > Windows port,
 > the work on it, and the GUI Monitor utility.

Thanks. Ntpd is a great program, and I'm happy to support both the 
project and the community whenever I can and have time to do it.


Martin
-- 
Martin Burnicki

Meinberg Funkuhren
Bad Pyrmont
Germany



More information about the questions mailing list