[ntp:questions] drift value very large and very unstable

Andy Helten andy.helten at dot21rts.com
Fri Mar 7 15:45:02 UTC 2008


Rob Neal wrote:
> On Wed, 5 Mar 2008, Andy Helten wrote:
>
>   
>> I think I have been giving it enough time to stabilize -- any test I
>> consider legitimate was allowed to run for at least 8 hours.  Most tests
>> ran overnight for 18-24 hours and some tests ran over weekends for
>> nearly 72 hours.  Results were always the same (very large drift).  In
>> fact, if allowed to run long enough, the drift almost always reached the
>> +/-500 max.
>>     
>  	The drift only tells part of the story. What is the offset
>  	doing? Does it cross zero and continue to diverge, or is
>  	it still headed to zero?
>   

In one run that I looked at closely (which was the subject of another
email I just sent), the offset increased to about 90ms and then started
to decrease.  I stopped the run before it had time to fully converge to
zero and the drift value was still increasing, so it's not the perfect
example.  A quick look at the logs from another runs shows the offset
reaching 115ms.  This test ran for several hours, the drift eventually
reached 500ppm at which point the offset bounced around from 1 to 3ms
(i.e. the offset was very unstable).  I guess you would expect an
unstable offset if you are banging up against the upper end of the
drift.  Here are some lines from that stats.loop:

54525 62788.352 0.023098393 0.000 0.008166515 0.000000 6 <-BEGIN
54525 62791.352 0.023381710 0.004 0.007639732 0.001478 6
54525 62809.352 0.025086955 0.031 0.007171702 0.009616 6
54525 62825.352 0.026604190 0.056 0.006729925 0.012703 6
54525 62840.352 0.028026520 0.082 0.006315321 0.014822 6
54525 62856.352 0.029544500 0.110 0.005931771 0.017072 6
54525 62872.352 0.031063284 0.139 0.005574586 0.019098 6
54525 62889.352 0.032676090 0.172 0.005245631 0.021358 6
54525 62905.352 0.034194599 0.205 0.004936122 0.023067 5
54525 62922.354 0.035808330 0.350 0.004652435 0.055665 5
54525 62937.352 0.037232161 0.483 0.004380973 0.070196 5
54525 62955.354 0.038941107 0.650 0.004142326 0.088332 5
54525 62972.352 0.040555202 0.815 0.003916589 0.101018 4
54525 62990.352 0.042264909 1.460 0.003713166 0.246815 4
    <snip>
54525 63663.352 0.106036189 47.865 0.001520471 1.434006 4
54525 63678.352 0.107457194 49.402 0.001508397 1.447306 4
54525 63696.352 0.109161199 51.068 0.001534212 1.476368 4
54525 63712.352 0.110677398 52.756 0.001531972 1.504564 4
54525 63727.352 0.112099421 54.360 0.001518664 1.517296 4
54525 63743.352 0.113614091 56.094 0.001518165 1.545992 4
54525 63758.354 0.115036493 57.739 0.001506528 1.558793 4 <-MAX
54525 63773.352 0.113958623 59.369 0.001459845 1.567895 4
54525 63791.352 0.114164639 61.111 0.001367501 1.590703 4
54525 63808.352 0.113274943 62.840 0.001317288 1.608565 4
54525 63825.352 0.113386776 64.570 0.001232844 1.624260 4
54525 63843.352 0.113093372 66.296 0.001157876 1.637280 4
54525 63860.352 0.112205284 68.008 0.001127688 1.646820 4
54525 63875.352 0.111627571 69.605 0.001074448 1.640657 4
54525 63891.352 0.111143832 71.300 0.001019502 1.647666 4
54525 63906.352 0.111067353 72.889 0.000954040 1.640427 4
54525 63924.352 0.110773822 74.580 0.000898437 1.646740 4
54525 63942.352 0.110479720 76.265 0.000846819 1.651672 4
54525 63958.352 0.109498471 77.936 0.000864766 1.654077 4
    <snip>
54528 52785.354 0.002121099 500.000 0.000561826 0.014400 10
54528 52803.352 0.002828893 500.000 0.000582078 0.017494 10
54528 52821.352 0.003035901 500.000 0.000549381 0.016695 10
54528 52838.352 0.003150695 500.000 0.000515499 0.015727 9
54528 52854.354 0.001669291 500.000 0.000711928 0.014711 9
54528 52869.352 0.001417609 500.000 0.000671866 0.013761 9
54528 52886.354 0.001530270 500.000 0.000629734 0.012872 9
54528 52904.352 0.001413156 500.000 0.000590516 0.012041 10
54528 52921.354 0.002351464 500.000 0.000644339 0.018575 10
54528 52940.352 0.003152638 500.000 0.000665967 0.021479 10
54528 52956.352 0.003170197 500.000 0.000622986 0.020094 10
54528 52973.352 0.002956953 499.991 0.000587606 0.019083 9 <-END


>> The tinker commands are also necessary (at least disabling the step) due
>> to some commercial software that has serious problems with backward time
>> steps.  This problem should be fixed in a future version, but that may
>> not be soon enough for us.  Even then, we may not want time to step
>> backwards.
>>     
>  	There is a reason they are options. Try setting your clock
>  	by hand an hour or so off, and starting ntpd. Watch the
>  	time it reports while it plays catch-up. Scary.
>  	Your call, but it would probably fail an audit.
>   

I understand your point and it is indeed scary to consider the
catastrophic failures enabled by preventing time steps.  My
counter-point is that no one is going to be setting time on these
systems and time should never jump.  If IRIG-B time is jumping around
wildly, then no other subsystem will work correctly if it relies on
accurate time in any way.  It would need to be fixed.  In fact, if
IRIG-B time jumps more than a certain amount, our subsystem stops using
it for synchronizing system time.  It is better for us to drift from
IRIG-B time, so long as the various boards in our system remain
synchronized with each other.

In reality, we must be able to assume IRIG-B time is stable.  With that
assumption, we must also be able to assume we can keep system time
within a few milliseconds of IRIG-B time.  We use IRIG-B time directly
(i.e. read it from the IRIG PMC's registers) on boards that require
highly accurate time synchronization , but not all boards have an IRIG-B
PMC.  We use ntp-disciplined system time for timestamps that aren't so
critical.  The NTP synchronization requirements are TBD, but will
probably be on the order of 1-50ms accuracy between the IRIG-B synced
NTP server and the various NTP clients.  I don't think this is
unreasonable and is achievable in all the testing I've done on *other*
hardware and software.

Andy





More information about the questions mailing list