[ntp:questions] Re: NTP stepping issue

David L. Mills mills at udel.edu
Mon Oct 25 16:51:45 UTC 2004


We are talking right past one another. First, roundtrip delays can 
indeed exceed 128 ms and even much more, as with the Mars Internet 
simulations reported on the web. That has nothing to do with jitter. 
Jitter can exceed 128 ms on occasion, but the clock state machine 
mitigates that. In addition, the huff-n'-puff scheme is a crude but 
effective remedy for assymetric delays in at lease one common case.

The simulator code IS the actual running code includeing all the 
algorithms anmd the clock discipline algorithm in particular - no change 
other than to simulate network jitter and oscillator wander and the 
kernel clock itself. There is no "interface" as such; the only system 
calls are to adjtime() and settimeofday(), which are indeed simulated.

You comment on the lack of verification between simulation and practice. 
Believe me, I don't write papers on this stuff without verifying 
simulation and practice. See the evidence in my 1998 SIGCOM paper and 
IEEE/ACM Trans. Networking paper. The phase and frequency noise 
generators were ruthlessly checked against reality.

The 500 PPM limit is due to practical considerations and found to be 
wholly appropriate in the surveys reported on the web. Look at the 
histogram of nominal frequency offsets. It is indeed necessary to 
enforce some upper limit in order to conform to strict computer science 
theoretical considerations. You might not like 500 PPM. Fergawsh sakes, 
change it to fit your fancy. But not in the public distribution.

The fact the frequency and time parameters are not to your liking is 
wonderful. Change them and be happy. Try running ntpd with an update 
interval of 36 hours. Runs just fine and keeps the residual offsets in 
the 10-40 ms range. Of course, change the room temperature a few degrees 
and the poll interval plummets accordingly.

In passing, note the kernel modifications now in Solaris, Tru64, Linux 
and FreeBSD were verified in simulation using the actual code fragments, 
just as in NTP. I continue to be enthusiastic about testing in this way.


Philip Homburg wrote:
> In article <clep4d$mmo$1 at dewey.udel.edu>,
> David L. Mills <mills at udel.edu> wrote:
>>Philip Homburg wrote:
>>>In article <cldtmo$gde$1 at dewey.udel.edu>,
>>>David L. Mills <mills at udel.edu> wrote:
>>>>It is 
>>>>useful primarily at long poll intervals where errors are dominated by 
>>>>the intrinsic frequency stability (wander) of the clock oscillator. At 
>>>>shorter poll intervals the errors are dominated by phase errors due to 
>>>>network and operating system latencies. The trick is to combine then in 
>>>>an intelligent hybrid loop, as described on the NTP project page.
>>>The strange thing is that the 500 ppm / 128 ms limit keeps popping up. 
>>>I can understand strange limits closed-source / broken operating systems.
>>>But somehow that limit is also present on open source operating systems.
>>I don't understand your comment. Are you saying the 500-PPM limit and/or 
>>128-ms step threshold are strange? 
> Yes. Round trip latencies on the Internet can easily exceed 128 ms. That
> means that due to asymmetry on overloaded links, offsets of more than
> 128 ms can occur.
> Furthermore, a clock that is disconnected from the net for more than a
> day and that is 2 ppm or more off, will result in an error of more than
> 128 ms when the network connection is restored.
> In my opinion, NTP should handle those situations gracefully instead of
> stepping the clock.
> When large offsets are to be corrected quickly, slew rates of more than
> 500 ppm are required, so NTP and the kernel interface should be
> prepared to handle them.
>>>The problem is of course time. The main experiment I want to do is 
>>>black box testing of NTP implementations: create a reference clock with
>>>a known distortion, feed it to the ntp implementation that is to be tested 
>>>and then poll that implementation to compare the filtered time to
>>>the input signal.
>>Yes, the problem is time, your time and mine. That's exactly why the 
>>simluator was built. Testing things in vivo takes lots and lots of time, 
>> especially when testing for stability at long poll intervals. With the 
>>simulator, testing over a week takes a few seconds. You can even turn on 
>>debugging and file statistics, which is really useful in finding little 
>>warts like you will be looking for.
> As far as I understand, the simulator is built by linking the code of
> the simulator to an implementation of a clock discipline algorithm.
> I can see two serious problems with that approach:
> 1) The internal interfaces in my code are complete different from the
>    interfaces in NTP. I can either create a version of my software with
>    interfaces that match those in NTP or a I can adapt the simulator to
>    my interfaces.
>    Both case are undesirable. It takes time to create and maintain two sets
>    of interfaces. 
>    It is not clear whether the simulated results have any value. If I change
>    the simulator it would be necessary to verify that the results are
>    comparible to a simulation of NTP on an unchanged simulator.
>    If I chance my code, I would have to verify that the simulated code
>    corresponds to the real code.
> 2) I would have to verify that the model behind the simulator is 
>    accurate enough. In a world with Windows doing weird things, with
>    Linux kernels losing interrupts, etc. modeling the kernel/hardware
>    is not trivial. 
>>>As far as I know, such a blackbox test setup does not exist.
>>Again, the black box test setup is the simulator. I make the case it is 
>>a rigorous test, since the black box code really and truly is the same 
>>code as runs in the daemon itself. 
> That may be the case for NTP but it doesn't apply to other implementations
> time synchronization that use the ntp packet format. 
> I understand the value of a discreet event simulator when developing NTP.
> But to answer the question whether a particular installation is accurate
> and stable enough, it is in my opinion necessary to do 'in vivo' black-box
> testing.

More information about the questions mailing list