[ntp:questions] Re: tinker step 0 (always slew) and kernel time discipline
Richard B. Gilbert
rgilbert88 at comcast.net
Fri Sep 22 12:28:05 UTC 2006
Joseph Harvell wrote:
> Richard B. Gilbert wrote:
>>How about designing your NTP subnet in such a way as to prevent these failures in the first place?
>>Since you say, elsewhere, that you are more concerned that time be strictly monotonically increasing than that it be accurate perhaps you don't need NTP at all; set your local clock from your wrist watch once a week while the application is not running
>>Your original problem, IIRC, resulted from an extremely poor design of your NTP subnet; two servers each serving its unsynchronized local clock and drifting apart.
>>If you really do need NTP the easiest configuration is for your client to use from four to seven servers. Those servers should be stratum 2 internet servers (rules of engagement prohibit use of public stratum 1 servers unless you are serving 100 or more clients). This requires that you study the list of public stratum two servers at http://ntp.isc.org/bin/view/Servers/StratumTwoTimeServers
>> to find four to seven servers within, say, 300 miles of your site and adding these servers to your ntp.conf file. It also requires a connection to the internet that allows port 123 in both directions. If you specify the numeric IP address of each server, you need not open any other port in the firewall. If you wish to use domain names, the you will have to open the port(s) necessary to allow DNS to work (don't know which ones offhand.
>>The simplest configuration is to make the machine running the application a stratum 1 server by installing ntpd and a GPS timing receiver as a hardware reference clock. The weakness of this configuration is that the GPS receiver becomes a single point of failure; if it dies, you rapidly lose any claim to accuracy. Since you don't insist on accuracy perhaps this would not be a problem. Actually, ntpd would continue to discipline the clock using the last known frequency correction so you would have several hours of "hold over" before your clock drifted significantly (assuming a controlled temperature in your data center).
>>You can increase the reliability by using four GPS timing receivers to synchronize four NTP servers and configuring your client to use those four servers.
> I really appreciate the advice. I think you are getting the wrong idea
> about my approach to handling the problem since I don't seem concerned
> about the glaring problems in my configuration. The reason for this is
> the original problem manifested in a testbed for one of our products. I
> am concurrently tracking this down internally to determine whether the
> two servers are actually synced to a stratum 1 clock (or whether they
> are part of the same synchronization subnet at all). And I plan to
> correct the problem.
> Also, I completely agree that we should configure 4+ peers for each NTP
> client to avoid this failure scenario altogether.
> But keep in mind that it may not be practical for our customers to have
> 4+ NTP servers in their synchronization subnet. And arguably, they
> deserve what they get if they fail to follow our recommendation to have
> more servers.
> Nevertheless, I am still very interested in preventing step corrections
> in these scenarios. And I think this is a legitimate concern. So I
> would really appreciate it if you could also address the questions in my
> Joe Harvell
I lack the expertise to answer your question as now put! I've never
done such a thing or needed to.
The tinker keyword is, IMHO, well named! My understanding is that it is
intended for "tinkering" rather than for production use. It lets you
experiment without having to modify the code and rebuild each time.
It's, AFAIK, unsupported; if NTP malfunctions while you are using tinker
and you report it, the reply is likely be "then don't do that!"
IF your customers use NTP, it's THEIR responsibility to design and
operate their synchronization subnet properly. It's YOUR responsibility
to warn your customers that horrible things may happen if time is
stepped while your application is running.
Also note that ntpd is not the only method of managing computer clocks;
there is SNTP, "Open" NTP, the "daytime" protocol, rdate (Unix/Linux),
etc, etc. Some people use ntpdate in a cron job and that WILL, by
default, step the time. NTP is probably the best if you need/want
accurate time but the other means hang on for various reasons.
More information about the questions