[ntp:questions] NTP slow to start correction after a drift

Mike K Smith mks-usenet at dsl.pipex.com
Fri May 9 10:16:22 UTC 2008


Apologies for a long post, but I was unable to make it shorter.

I have been monitoring timekeeping performance on an environment which
contains 3 stratum 1 clocks and 4 Cisco routers running as stratum 2.
The stratum 1s use time which is derived originally from GPS, but fed
to the stratum 1 clocks via IRIG.

The monitoring is carried out from a single Solaris system which takes
time from all seven servers.

Normally all clocks show times within +/- 4ms, but every 7-8 days I
see an event where all 7 clocks drift out by about 10-18 ms over a
period of 2-3 hours before they are corrected.

I am interpreting this as being due to drift in the local clock on the
Solaris box which is doing trhe monitoring, I would expect the stratum
2 servers to lag the stratum 1s if the time on the stratum 1 servers
was drifting due to some common-mode problem with their time
reference.

I am concerned about the length of time it takes before NTP starts
correcting the local clock on the Solaris server.

I have a graph which you can see at
<http://www.flickr.com/photos/36096832@N00/2477948892/sizes/o/in/
set-72157604959850048/>

The above graph shows offset against time for all seven clocks. An
hour of steady state operation is shown before the beginning of the
drift event, the system has been in steady state for some days prior
to the drift event.

The poll interval is initially 1024 seconds.

The drift event starts about an hour into the graph, the offset
increases by about 15ms in about 2 hours (roughly 2ppm) then a
correction is applied and the clock drifts back to zero offset at
about the 3.5 hour mark.

I am concerned that the drift went uncorrected for so long, and am
trying to understand the cause.

Is the clock-filter algorithm rejecting updated timestamps which are
not the lowest of the most recent eight? From my reading of the book
and the RFCs, this is what should happen, but that means that the
clock can drift significantly before a new timestamp passes through
the clock filter algorithm.

To illustrate, here are the timestamp values for the three stratum 1
clocks over the period of the drift and the beginning of the
correction. The time base is the same as that of the graph.

Stratum 1 A
Time	Offset	Delay	Dispersion
00:00:00	-0.000052	0.000600	0.000200    * Lowest delay of the most
recent 8 values.
00:17:04	0.000394	0.001850	0.000370
00:34:08	-0.000174	0.000630	0.000400
00:51:12	0.000908	0.000580	0.000890    * New lowest delay - drift
begins about here
01:08:16	0.002661	0.000630	0.002180
01:25:20	0.004790	0.000750	0.003190
01:42:24	0.007350	0.000600	0.004120
01:59:28	0.010072	0.000610	0.004750
02:16:32	0.012666	0.000600	0.004910
02:33:36	0.015004	0.000610	0.004730
02:50:40	0.017115	0.000600	0.004390
02:59:12	0.018362	0.001970	0.003390
* The 000580 delay has now expired, there are three timestamps with
000600 delays in the shift register, which is chosen?
Whichever is chosen, the offset has drifted significantly since the
last timestamp was passed from the clock-filter.
03:06:00	0.017913	0.000600	0.001630    * Correction has begun
03:10:16	0.017275	0.000610	0.001080
03:14:05	0.015433	0.000580	0.002120    * New lowest delay
03:16:13	0.013812	0.000630	0.002610


Stratum 1 B
Time	Offset	Delay	Dispersion
00:12:47	-0.000637	0.010160	0.000260    * Lowest delay in shift
register is 0.009900
00:29:51	-0.000810	0.010330	0.000320
00:46:55	0.000029	0.010180	0.000690
01:03:59	0.001683	0.010240	0.002000    Drift begins about here
01:21:03	0.003762	0.010220	0.003050
01:38:07	0.006200	0.010220	0.003940
01:55:11	0.008894	0.010130	0.004610    * New lowest delay
02:12:15	0.011507	0.010030	0.004880    * New lowest delay
02:29:19	0.013935	0.010190	0.004810
02:46:23	0.016025	0.010150	0.004430
02:54:55	0.016739	0.010210	0.002870
03:03:27	0.017224	0.010160	0.001850
03:07:43	0.016871	0.010380	0.000870    * Correction has begun
03:11:59	0.016221	0.010100	0.000850    * New lowest delay
03:14:07	0.014934	0.010240	0.001620
03:16:15	0.013274	0.010150	0.002430


Stratum 1 C (Selected as Sync Server during the whole of this time)
Time	Offset	Delay	Dispersion
00:01:52	-0.000076	0.009250	0.000200    *Lowest delay in shift
register is 0.009090
00:18:56	-0.000287	0.009230	0.000310
00:36:00	-0.000091	0.009160	0.000150
00:53:04	0.001073	0.009310	0.001190
* Delay of 0.009090 expires, new lowest delay is 0.009160
  Drift begins about here
01:10:08	0.002899	0.009410	0.002400
01:27:12	0.005351	0.009630	0.003630
01:44:16	0.007630	0.009220	0.004070
02:01:20	0.010348	0.009250	0.004700
02:18:24	0.012981	0.009250	0.004910
02:35:28	0.015285	0.009200	0.004700
02:52:31	0.017373	0.009250	0.004360
* Delay of 0.009160 expires, new lowest delay is 0.009200
03:01:03	0.017929	0.009230	0.002690
03:06:32	0.018002	0.009190	0.001340    * New lowest delay
03:10:48	0.017277	0.009290	0.000990    * Correction has begun
03:13:14	0.016549	0.009190	0.001100
03:15:22	0.014858	0.009280	0.002150

Why is the polling interval maintained at 1024s for so long in the
presence of the drift?
Apart from reducing the maximum polling interval, what else could I do
to hasten the response to this kind of clock drift?

The offsets from the set of clocks normally remains within +/- 4ms,
which is sufficient for our needs, but a drift out beyond 15 ms is a
cause for concern. We are hoping to be able to maintain time to within
+/- 5ms of UTC on our NTP clients.

The drift rate seen here is about 2ppm. If the drift rate were about
6ppm and we saw the same slow response to the drift, the clock could
drift out by 50ms before the correction begins, this would definitely
be regarded as poor timekeeping, and would cause alarms to be raised.

I would be grateful for any comments or advice.

Regards,

Mike






More information about the questions mailing list