[ntp:questions] ntpd and database servers

unruh unruh at wormhole.physics.ubc.ca
Wed Jan 27 00:23:41 UTC 2010

On 2010-01-22, unruh <unruh at wormhole.physics.ubc.ca> wrote:
> On 2010-01-21, Brian Utterback <brian.utterback at sun.com> wrote:
>> unruh wrote:
>>> But the timestamps are not lies. they are far closer to the real time
>>> than is that serverd by ntpd, and it is time that ntp serves, not slew
>>> rate. ntpd wanders off into the wilderness ( I have seen it go off by
>>> 20ms, when its steady state error is 2 usec) with its slewing that those
>>> times are far more lies that the 20usec off but with a slew rate of
>>> 700PPM that you might occasionally with chrony. Remember at the fastest
>>> ntp queries once every 16sec. By that time the slew is long finished on
>>> chrony ( but barely started on ntpd). Ie, I simply do not believe what
>>> you are saying here. 
>> Well, clearly a step is no different from the client's POV than a fast
>> slew that completes entirely between polls. But a sufficiently long
>> slew is going to result in downstream clients observing that their
>> clocks are much too fast or much too slow and make adjustments to
>> their own drift rates. The longer the slew goes on, the longer the
> Lets say that the clock is out by 10 min. then it is true that ntp with
> step that 10 min on startup ( or quit) and get itself to the right time.
> chrony, without the initslewstep will slew away that 10 min over the the
> next 100 min. Thus anyone else who queries chrony during that time will
> get a bad time, out by many minutes. On the other hand one can also
> easily tell chrony to step if it is out by that amount. the more
> critical situation is if the clock is out by say .1 sec. chrony will get
> rid of that in the next second. ntpd will over the next few hours to
> days (depending on the poll) slowly alter its internal rate to finally
> drift that time offset to zero. along the way it might actually go the
> wrong way for a while and exceed the "step threshold", step to 0, and
> then have to correct the totally wrong slew rate left over from the
> initial attemp to slew the correction. 
>> clients will be adjusting their own drifts until eventually they
>> client's drift corrections are going to match the servers fast slew.
>> Then when the server gets to where it is going, the slew will stop and
>> the clients will continue right past that point. Then the clients will
>> have to spend the same amount of time getting their drift corrections
>> back to normal, at which point they will have overshot and have to get
>> back.
> This is certainly possible if chrony is trying to correct a huge error.
> And since under the fast slew chrony is running at about 100000PPM off
> the correct rate the ntp clocks will exceed the ntp limit of 500PPM, and
> get clamped to that rate for a long time, which will help mitigate their
> trying to follow the high chrony rate. 

M Lichvar has pointed out to me that this is wrong. Chrony at each time
correction calculates both the rate and the offset from the true time
(Using a corrected linear regression). It adjusts the clock for the
rate. It also rapidly slews away the offset. However, it knows what the
offset is and also how long it has been rapidly slewing. It can thus
correct the system clock for that offset, and does so on delivering the
time to the other machine via ntp protocol. Ie, in the time it delivers
to the other machine it has effectively stepped away that offset. Thus
at all times it delivers its best guess as to the true time. 
ntpd on the other hand has no idea what the offset is at any time, just
the rate. It uses the measured offset from its servers to adjust the
rate so that eventually the offset is driven to zero. But it has not
memory and thus delivers to the remote machines only the current time
which could be far off the true time. All you know with ntp is that if
you wait long enough, the difference between local time and true time
will go to zero. That "wait long enough" is about an hour or more for
each halving of the offset. (Thus if you want an accuracy of say 10us,
and the current offset is 10ms, it will take more than 10hours to get
down to the desired accuracy. If the rate of the system clock changes in
the meantime, it can take much longer.)

As an example I have one machine running ntpd and a gps clock. Recently
the temp in the room has been suffering about a 1 degree C oscillation
with a period of about an hour. This causes ntpd to oscillate the system
offset from "true time" by about 10usec. ( without that temp
oscillation, the offset is about 2usec due most likely to temp
fluctuation due to fluctuating work of the cpu).

>> Perhaps if everybody used chrony it would adjust so quickly that it
>> wouldn't matter. And I do agree ntpd in slewalways mode would be even
>> worse or at least more susceptible, since the likelihood of a slew
>> lasting longer than a poll interval will be greater. But I do know
>> that the drift cap is needed to make the proofs in Das Buch work out.
>> Now, it may be that the edge cases that ntpd handles better than
>> chrony never happen in real life; I don't know. But I do know that
>> within the parameters of proofs ntpd is well behaved. I don't think we
>> have the same assurances about chrony.
> I agree that chrony has not received the same attention to the theory of
> operation that ntp has. As Hassler said, maybe someone has a grad
> student who needs a masters thesis topic. 
>> This is OT, but I do think that ntpd could be greatly improved and
>> could learn a lot from the approach crony uses. But I don't think
>> chrony is appropriate in all cases either nor is it perfect yet. I
>> have pointed out some clear flaws in ntpd that would be relatively
>> easy to fix, but was rebuffed. I do intend to address the myself
>> someday (assuming Dr. Mills will allow the changes in ntp_proto.c) but
>> haven't had the time yet.
> If you have some pointers about clear flaws in chrony, please point them
> out. At present the next release ( which includes refclock support) is
> very close, and bug reports are needed. chrony-dev at chrony.tuxfamily.org
> What is ntpd doing about the 2038 bug by the way?
>> Brian Utterback

More information about the questions mailing list