[ntp:questions] NTP not syncing

unruh unruh at invalid.ca
Sat Dec 14 18:55:18 UTC 2013

On 2013-12-13, Richard B. Gilbert <rgilbert88 at comcast.net> wrote:
> On 12/7/2013 7:35 PM, Magnus Danielson wrote:
>> On 12/07/2013 11:39 PM, Harlan Stenn wrote:
>>> Magnus Danielson writes:
>>>> The drift-file-accelerated lock-in isn't robust. Current behavior of
>>>> response isn't very useful for most people experiencing it.
>>> I'm not sure I'd agree with the word "most".  It's certainly worked very
>>> well on hundreds of machines where I've run it, and the feedback I've
>>> had from people when I've told them about iburst and drift files has
>>> been positive except when they've had Linux kernels that calculate a
>>> different clock frequency on a reboot.
>> Experiencing the problem that is. When it works, it's a lovely tool.
>> Sorry if the wording was unclear in that aspect.
>>> There are at least 2 other issues here.
>>> One goes to "robust", and yes, we can do better with that.  It's not yet
>>> clear to me that in the wider perspective this effort will be worthwhile.
>> Well, you can either choose a rather simple back-out method or if you
>> think it is worthwhile a more elaborate method. Getting cyclic re-set of
>> time is a little to coarse a method. I think it is better to back-out
>> and one way or another recover phase and frequency.
>>> The other goes to the amount of time it takes to adequately determine
>>> the offset and drift.
>>> With a good driftfile and iburst, ntpd will sync to a handful of PPM in
>>> about 11 seconds' time.
>>> We've been working on a project to produce sufficiently accurate offset
>>> and drift measurements at startup time, and the main problem here is
>>> that it can take minutes to figure this out well, and there is a
>>> significant need to get the time in the right ballpark at startup in
>>> less than a minute.  These goals are mutually incompatible.  The intent
>>> is to find a way to "get there" as well as possible, as quickly as
>>> possible.
>> Getting the time in the right ball-park is by itself not all that hard.
>> However, frequency takes time to learn and getting phase errors down
>> quickly becomes an issue. NTP has as far as I have seen reduced loop
>> bandwidth and at the same time reduced the capture range, and whenever
>> you reduce the capture range you need to have heuristics to make sure
>> you back-out if things get upset. Recovery of old state is good, but one
>> needs to make sure that you don't loose that robustness.
>> As for method of locking in quickly, that can be debated on in length.
>> Cheers,
>> Magnus
> It has been debated!  It will probably be debated for the next thirty
> or forty years.  There is something about the topic that seems to
> to encourage debate! ;-)

Not really. It has been pointed out that it is a problem for ntpd. And
Mills has stated that it is an uninteresting problem. That is not a
debate. It is one of the weaknesses of ntpd as used by many people. It
is not a particularly interesting weakness, I agree, if ntpd is run for
many months or years at a time. It is interesting if it is run only for
hours at a time however.  There has been a concession in that if there
is no drift file, ntpd now has software to try to make it get a
reasonable estimate of that drift quickly instead of waiting for the
feedback loop to converge from 0  to a reasonable estimate. ntpd,  however, still has
nothing to actually check on the drift file to make sure it is sane. 

ntpd could run the same startup code that it does if there is no drift
file, and, after it gets its initial estimate, it compares the result to
the drift file rate. If it is close, use the drift file. If it  is not
close ( eg >3PPM say or whatever the estimate on the error in the rapid
initial determination is) then use the estimate instead. The code is all
there. It is just the glue that is needed. 
(This is independent of the question as to whether or not the simple
feedback circuit of ntpd is the best way of conditioning the local
clock, which has to do with how fast ntpd responds to changes in the
local clock rate caused by temperature variations for example.)


More information about the questions mailing list