[ntp:questions] Re: keeping ntpd running

David L. Mills mills at udel.edu
Thu Sep 25 16:09:22 UTC 2003


Brian,

I'm glad you siezed on that bone, which has been botherning me for some
time. There are two thresholds implemented in NTPv4 (ntpd) just for this
problem, minclock and minsane, both arguments to the tos command. These
have been around for awhile, but probably badly documented. The minclock
threshold is used by the clustering algorithm as it casts off outlyer
servers until the total remaining is not more than this value. At the
moment, minclock defaults to three mostly for historic reasons. From
Byzantine agreement principles, it really should be four.

The interesting threshold is minsane, which is the minimum number of
survivors necessary to declare the client synchronized. It defaults to
one in the interest of fast synchronization, but really should be
something higher like four, assuming that number of servers can always
be found. If minclock and minsane are both set to four and some greater
number of servers, like six, were available, once several samples have
been collected from each of at least four servers, the clock would be
set. As more servers are found, the best four of them would survive to
set the clock. This would be the ideal configuration from a
sanity/antiterrorist point of view, but if this were the default case
the volume of confused mail on this list would easily double.

So, stick a "tos minclock 4 minsane 4" in your configuration file along
with six servers and watch the fun. But, please note, no such feature is
in NTPv3 (xntpd).

Dave

Brian Utterback wrote:
> 
> Ian Diddams wrote:
> > "Richard S. Shuford" <shuford at list.stratagy.REM0VE-THlS-PART.com>
> > wrote in
> >
> >
> >>If ntpd is found to run for 20 minutes and then quit, what's
> >>going on is probably this:  If the ntpd daemon, when starting
> >>up, finds that the system's internal clock is more than 1000
> >>seconds different from the time ticks received from external
> >>sources, ntpd wants a human to figure out why and make an
> >>intentional clock change; the daemon is programmed not to
> >>simply trust the external ticks and change the system's clock
> >>on its own.  So the daemon rather quietly logs a message:
> >>
> >>    time error is way too large (set clock manually)
> >>
> >>and then it gives up.  Without updating the clock.  If left
> >>to its own devices, it will never succeed, and the system's
> >>time setting will erode until you have the Temporal Dust Bowl.
> >
> >
> >
> > Interesting!
> >
> > So your saying that a system might run for several days apparently
> > quite happily, but in all that time the clock is actually getting
> > further away from centralised time rather than closer to it until such
> > time as it > 1000 seconds out, at which time it fails/dies/stops/gives
> > up/stops running?
> 
> I don't think this is quite what Richard is saying. If you have a server
> that is giving bad time, or the current time on the system is greater than
> 1000 seconds off of the correct time, if you are using xntpd and did not
> run ntpdate prior to starting it, it may exit shortly after starting,
> giving the message above. After that, the system clock is free running,
> since xntpd is not running.
> 
> This scenario is less likely using ntpd is the -g option is used as well.
> In this case, ntpd will act on its own as the ntpdate.
> 
> A big problem at start up of xntpd and ntpd is that the mitigation algorithms
> are initially crippled in the interest of a fast start, and may not detect
> a falseticker and will happily sync to it if it is the first server that becomes
> available. Suppose you have a system that is 500 seconds off, and there are four
> servers available, one of which is 1001 seconds off in the same direction. If the
> timing just happens to be right and the falseticker is the first to reach usability
> after 5 polls, the system will step the clock to match the falseticker and will then
> reset. After 5 more polls, the truechimers will vote the false ticker off the island,
> and the system will now want to reset to the correct time, but since this is not the
> first sync, the 1000 second limit will come into play and ntpd (or xntpd) will exit.
> 
> --
> blu
> 
> Lesson from the blackout of 2003:
> The power grid is THE most critical infrastructure, upon which all
> others depend, and nobody really knows how it works.
> --------------------------------------------------------------------------------
> Brian Utterback - Solaris Sustaining (NFS/Naming) - Sun Microsystems Inc.,
> Ph/VM: 781-442-1343, Em:brian.utterback-at-ess-you-enn-dot-kom



More information about the questions mailing list