[ntp:questions] Linux NTP Kernel unsync flag remains long after NTP&Kernel have PPL sync

Darryl Miles darryl-mailinglists at netbauds.net
Tue Aug 26 22:24:37 UTC 2008


Thanks for your replies.


Unruh wrote:
 > A far better idea is to monitor the offset from the ntp servers to 
let you
 > know if there is a clock problem.

I'd appreciate a tool for that. "/usr/sbin/ntpdc -check 0.0000:0000:0000 
-print" that takes various parameters for your acceptable accuracy and 
returns with zero/non-zero exit status.  That might also dump data like 
adjtimex -print and indicate items of concern to the administrator.

The params 0.0000:0000:0000 would be some acceptable accuracy 
description on offset/error/whatever makes sense to ntp groks.  I 
wouldn't know what to put!


 > Leave it unsynced. It serves no useful purpose AFAIK. hwclock is a 
much better
 > idea to use to set the rtc, and does a much better job of it ( including
 > determining the drift of the rtc and compensating for it. )

On a personal observation note, I'm not sure I agree that hwclock/drift 
file is even good for managing the hardware RTC.  While the machine is 
switched on we have NTP, while the machine is switched off the 
internal/component temperature is vastly different so any drift 
estimation maintained over time whilst powered up might not be in the 
right ballpark, you do maintain different drift data whilst powered up 
and powered down don't you ?



David Woolley wrote:
 > Being unsynced indicates a problem.  The end stop estimated errors also
 > indicate a problem.  If you don't want the 11 minute mode, build a
 > kernel without it.

Ah ha, now I see.  Yes, the maximum error / estimated error of my 
systems does appear to be at a 16bit unsigned integer endstop:

 >>> ntpdc> kerninfo
 >>> pll offset:           4.7e-05 s
 >>> pll frequency:        -62.146 ppm
 >>> maximum error:        16.384 s
 >>> estimated error:      16.384 s
 >>> status:               0041  pll unsync
 >>> pll time constant:    2
 >>> precision:            1e-06 s
 >>> frequency tolerance:  512 ppm
 >>> ntpdc>


The above data is for a running system that has (as far as I can tell) 
got plenty of reachability with a diverse number of systems and is in step.

ntpdc> peers
      remote           local      st poll reach  delay   offset    disp
=======================================================================
=80.85.129.25    xxx.yy.0.137     3 1024   77 0.01630  0.000452 0.28458
*82.133.58.132   xxx.yy.0.137     2 1024  377 0.02780 -0.001716 0.13663
=127.127.1.0     127.0.0.1       10   64  377 0.00000  0.000000 0.03059
-xxx.yy.0.191    xxx.yy.0.137     2 1024  376 0.00435  0.001452 0.16240
=86.59.99.138    xxx.yy.0.137     3 1024  377 0.03571  0.001900 0.12178
=83.170.75.28    xxx.yy.0.137     3 1024  377 0.00484 -0.000715 0.13660
+xxx.yy.0.240    xxx.yy.0.137     3 1024  357 0.00031  0.006866 0.14854
^xxx.yy.0.255    xxx.yy.0.137    16   64    0 0.00000  0.000000 4.00000 
(BROADCAST addr)
+zz.xxx.83.153   xxx.yy.0.137    16 1024    0 0.00000  0.000000 3.99217
=84.54.128.8     xxx.yy.0.137     2 1024  377 0.06641  0.002147 0.12181

ntpdc> sysinfo
system peer:          unused.foobar.com
system peer mode:     sym_passive
leap indicator:       00
stratum:              3
precision:            -20
root distance:        0.05646 s
root dispersion:      0.05922 s
reference ID:         [xx.yy.0.191]
reference time:       cc5effc0.8730a3ef  Tue, Aug 26 2008 23:18:40.528
system flags:         auth monitor ntp kernel stats
jitter:               0.000320 s
stability:            0.000 ppm
broadcastdelay:       0.003998 s
authdelay:            0.000003 s

# uname -a
Linux host1.foobar.com 2.6.18-53.1.21.el5xen #1 SMP Tue May 20 10:03:27 
EDT 2008 x86_64 x86_64 x86_64 GNU/Linux


So why might the kernel maximum/estimated error at the end stop ?





David L. Mills wrote:
> When the client is first started until setting the clock, this statistic 
> will be large (~16 s), as it is in your example. Once the clock is set 
> and after that this statistic is set to the synchronization distance 
> determined by the daemon.

Right so that is what is meant to happen, but it is not taking place for me.


> If the daemon crashes or loses all sources, the kernel will increase the 
> distance as required by the specification. Application programs can 
> establish their own bound (~1 s) above which they consider the clock 
> unsynchronized. The problem with managing the bit is that the kernel 
> doesn't know your particular bound.

Which agree's with my suggestion for some params to "ntpdc" to allow 
configuration of my bounds with an accuracy check.


Darryl



More information about the questions mailing list