[ntp:questions] Is this normal behavior?

Richard B. Gilbert rgilbert88 at comcast.net
Mon Dec 11 05:39:55 UTC 2006


Zembower, Kevin wrote:

> I'm not very knowledgeable about NTP, but I'm suspicious of its
> implementation on one of my servers. I have one server running the
> Debian sarge package of NTP:
> cn2:/var/log# ntpq
> ntpq> version
> ntpq 4.2.0a at 1:4.2.0a+stable-2-r Fri Aug 26 10:30:19 UTC 2005 (1)
> ntpq>
> 
> I've tried to set this server up as a timeserver for my network, using
> tock.usno.mil, a time server at my institution (Johns Hopkins
> University) and the pool timeservers:
> ntpq> peers
>      remote           refid      st t when poll reach   delay   offset
> jitter
> ========================================================================
> ======
> +jhname.hcf.jhu. 128.4.1.1        2 u   60   64  177    3.313  835.818
> 545.232
> *ntp1.usno.navy. .USNO.           1 u   60   64  177    8.567  827.174
> 551.616
> +trane.wu-wien.a 195.13.1.153     3 u   57   64  177  125.292  841.188
> 548.251
> +221-15-178-69.g 140.142.16.34    2 u   50   64  177  107.300  1212.00
> 395.490
> +tock.jrc.us     207.168.62.76    2 u   58   64  177   16.942  1020.49
> 432.251
>  LOCAL(0)        LOCAL(0)        13 l   55   64  177    0.000    0.000
> 0.002
> ntpq>

If you are going to use pool servers, you will be much better off using 
the "US" sub pool.  The round trip delays to  Europe are far too large 
for really good time keeping.

> 
> I notice the problem here, and if I run 'watch ntpq -p.' Seldom is my
> reachability 377, and it frequently and inexplicitly drops to 1 as I'm
> watching it. Ten minutes ago reachability was 177 as shown above.
> Watching it in the last minute, it dropped from 377 to 1 on all sources.
> Now it's:
> Every 2s: ntpq -p
> Fri Dec  8 10:35:18 2006
> 
>      remote           refid      st t when poll reach   delay   offset
> jitter
> ========================================================================
> ======
>  jhname.hcf.jhu. 128.4.1.1        2 u   53   64    3    2.707  660.049
> 176.587
>  ntp1.usno.navy. .USNO.           1 u   52   64    3    8.904  663.193
> 181.873
>  trane.wu-wien.a 195.13.1.153     3 u   49   64    3  125.979  681.070
> 186.182
>  221-15-178-69.g 140.142.16.34    2 u   51   64    3  104.207  485.281
> 183.834
>  tock.jrc.us     207.168.62.76    2 u   50   64    3   16.912  668.565
> 186.520
>  LOCAL(0)        LOCAL(0)        13 l   49   64    3    0.000    0.000
> 0.002
> 
Next, you do seem to have a reachability problem.  This is NOT normal 
behavior.  Since all the configured servers, including the local clock, 
exhibit the problem it is clearly not a problem with the servers.  I'm 
inclined to suspect a bug in the version you are running, or a problem 
with your O/S.  The fact that the local clock is affected suggests that 
the problem is internal to your system.

> Is this normal behavior for NTP, to frequently lose the ability to reach
> a timeserver? If not, how can I troubleshoot it further?
> 
> Here's the syslog entries pertaining to npt for just one hour this
> morning:
> Dec  8 09:03:06 cn2 ntpd[16955]: time reset +3.120367 s
> Dec  8 09:07:21 cn2 ntpd[16955]: synchronized to LOCAL(0), stratum 13
> Dec  8 09:08:25 cn2 ntpd[16955]: synchronized to 67.128.71.75, stratum 2
> Dec  8 09:09:31 cn2 ntpd[16955]: synchronized to 192.5.41.41, stratum 1
> Dec  8 09:23:33 cn2 ntpd[16955]: time reset +3.503628 s
> Dec  8 09:27:51 cn2 ntpd[16955]: synchronized to LOCAL(0), stratum 13
> Dec  8 09:28:56 cn2 ntpd[16955]: synchronized to 192.5.41.41, stratum 1
> Dec  8 09:36:18 cn2 ntpd[16955]: synchronized to 128.220.2.7, stratum 2
> Dec  8 09:36:25 cn2 ntpd[16955]: synchronized to 192.5.41.41, stratum 1
> Dec  8 09:40:40 cn2 ntpd[16955]: synchronized to 128.220.2.7, stratum 2
> Dec  8 09:41:45 cn2 ntpd[16955]: synchronized to 192.5.41.41, stratum 1
> Dec  8 09:41:47 cn2 ntpd[16955]: synchronized to 69.178.15.221, stratum
> 2
> Dec  8 09:43:46 cn2 ntpd[16955]: synchronized to 192.5.41.41, stratum 1
> Dec  8 09:49:21 cn2 ntpd[16955]: synchronized to 67.128.71.75, stratum 2
> Dec  8 09:49:21 cn2 ntpd[16955]: time reset +4.512453 s
> Dec  8 09:53:41 cn2 ntpd[16955]: synchronized to LOCAL(0), stratum 13
> Dec  8 09:54:39 cn2 ntpd[16955]: synchronized to 128.220.2.7, stratum 2
> Dec  8 09:54:44 cn2 ntpd[16955]: synchronized to 192.5.41.41, stratum 1
> Dec  8 10:01:11 cn2 ntpd[16955]: synchronized to 67.128.71.75, stratum 2
> Dec  8 10:02:14 cn2 ntpd[16955]: synchronized to 192.5.41.41, stratum 1
> Dec  8 10:09:40 cn2 ntpd[16955]: synchronized to 67.128.71.75, stratum 2
> Dec  8 10:10:51 cn2 ntpd[16955]: synchronized to 192.5.41.41, stratum 1
> Dec  8 10:10:51 cn2 ntpd[16955]: time reset +3.950081 s
> Dec  8 10:15:08 cn2 ntpd[16955]: synchronized to LOCAL(0), stratum 13
> Dec  8 10:16:14 cn2 ntpd[16955]: synchronized to 192.5.41.41, stratum 1
> 
> These time resets seem rather large to me. Is this normal, too?

The large resets are not normal either.  These are usually observed when 
  the system in question is losing clock interrupts.   Both Windows and 
Linux have been known to exhibit this problem.  In the case of Linux 
there is a kernel parameter called "HZ" which, if set to 250 or 1000 
increases the probability that interrupts can be masked or disabled for 
two consecutive clock "ticks".  The fix is to set HZ to 100.
> 
> Here's the output of the peerstat.qwk script:
> cn2:/var/log/ntpstats# gawk -f
> /home/kevinz/ntp-4.2.2p4/scripts/stats/peer.awk peerstats
>        ident     cnt     mean     rms      max     delay     dist
> disp
> ========================================================================
> ==
> 67.128.71.75     715 2364.263  996.393 2364.263   16.751  947.994
> 96.315
> 127.127.1.0      844    0.000    0.000    0.000    0.000    0.990
> 0.957
> 192.5.41.41      728 2321.937 1023.300 2964.896    7.907  944.371
> 94.581
> 137.208.3.51     675 2500.134  995.134 3013.699  124.989  503.205
> 44.951
> 128.220.2.7      721 2405.904  982.298 2712.599    2.564  940.869
> 95.476
> 69.178.15.221    722 2395.872 1002.130 2911.691  104.730  997.703
> 95.423

These numbers are NOT good.

The first thing to do is to fix HZ if you are running Linux.  If you are 
not running Linux please tell us what you ARE running.

If fixing HZ does not improve performance, repost with more details as 
to hardware, O/S, network, etc.




More information about the questions mailing list