[ntp:questions] Clock jumps when refclock used

A C agcarver+ntp at acarver.net
Mon May 7 18:55:10 UTC 2012


New update, this one is even more interesting.

I reconfigured to use the two refclocks (SHM and ATOM), set SHM as 
prefer and let ATOM run as normal.  The other five Internet servers that 
were there are still present (reminder:  4.2.7p270).

I had kernel disable as the only global configuration option.  SHM had a 
time1 fudge of 0.6 which put its offset at about zero +/-50ms.  ATOM had 
flag3 disabled (so no kernel discipline).  Everything else was default, 
Internet servers were iburst.

The end result of this:  The system was synced fine to ATOM and running 
happily and then suddenly ntpd went out of control in less than 24 
hours.  The system had an offset to all servers of over 10 seconds at 
the end.  There were sys_fuzz messages very frequently and constant 
stepping of the clock.

I tried increasing mindist significantly (up to 5) to see if that helped 
but no luck, it would still go haywire in under a day.

Now here is where it gets interesting.  On a whim I changed time1 of SHM 
so that it was no longer centered on zero but instead presented a 
single-sided offset to ntpd at all times.  In this case I dropped time1 
from 0.6 to 0.55 so that the offsets stayed on one side of zero.  Now 
the offsets go from 0 (actually just slightly above zero) to 100 instead 
of -50 to 50.  I kept mindist set high because of this larger offset to 
prevent clock hop.  I also increased the minpoll on ATOM and SHM to 5 so 
it's polling once every 32 seconds instead of every 16.

It worked.

It has been ten days straight without failure, without huge offsets, 
without random stepping or any other strange behavior.  Even the number 
of sys_fuzz messages has dropped.  The only time I get any sys_fuzz 
messages now is when the heat or air conditioner starts to alter the 
room temperature (I haven't thermally isolated the machine yet).  In 
those cases it sometimes does clock hop back to SHM for a little bit and 
then switches back to ATOM after the clock is adjusted (slewing only, no 
steps).  When the temperature is relatively stable I'm getting offsets 
of less than 200 us from ATOM and the system PPM holds reasonably 
steady, changing by less than 0.001 PPM in several polling periods.  The 
offsets at their worst are about 5 ms during major temperature swings 
(heat or A/C blows directly at the machine in its current location and I 
define "major swing" as a change in ambient by +/- 15F in ten minutes 
according to my thermometer -- overall the room changes +/- 25F over the 
course of a day according to the same thermometer).

For whatever reason, if the offset was allowed to swing on both sides of 
zero, it eventually caused the whole thing to spin out of control with 
wild oscillations (almost as if the PID loop was not quite damped enough 
and allowed to oscillate with the right amount of initial force 
applied).  Keeping the offsets single-sided quieted everything down 
considerably.

Comments appreciated.


More information about the questions mailing list