[ntp:questions] Re: NTP clients not syncing up to servers?

Tom Smith smith at cag.zko.hp.com
Sun Oct 9 04:11:45 UTC 2005


Your clients times are too far off from their servers' times to
synchronize. Before you start ntpd, you should first set the time
using (usually) "ntpdate -b [server] [server] ....". Start
with your lowest stratum servers (presumably the gateways),
set the time, start ntpd, and then wait for them to be
synchronized (ntpq -p" will show an asterisk next to the
left of one of its servers). As soon a server is synchronized,
it can be used by clients as a server to set _their_ time
and to supply time to _their_ ntpd.

A server's offset must be less than or equal to 128 msec
before it will be selected as a potential synchronization
source. Normally, ntpd will, upon finding all offsets greater
than 128 msec, attempt to step the clock to be within 128 msec
of something. However, if anyone has added the "-x" option to
the ntpd command line, that won't happen. Instead, the time
on that client will be ssssllllooowwwlllyyy adjusted to try
to close the gap. You and your customers will not be willing
to wait that long, if that time ever comes at all.
Remove "-x" if it is present.

If ntpd finds an offset of more than 1000 seconds, it will
terminate itself unless "-g" is present on the ntpd command
line. In that case, it will make one such adjustment and will
terminate itself if a second such adjustment is required.
Make sure you include "-g" (as you apparently have), at least
until all of the systems have logged many days of uninterrupted
NTP operation and you have ensured that ntpdate is always
run at boot time before ntpd is started.

Also make sure that each ntpd instance has at least 4
reliable, consistent, and _working_ lower-stratum servers
configured before you even start ntpd for the first time.

In any case, you should make no judgments about whether
ntpd is working properly or not until it has been running
for several hours, sometimes 2 days or more on a previously
unitialized system.

-Tom

Ted Beatie wrote:
> I realize that subject is a bit vague.  Unfortunately, the more I dive
> into the documentation for NTP, the more I'm convinced that it's magic.
> This message will be long, but I'm not sure what information is or is
> not relevant.
> 
> What we're trying to do;
> 
>   We deploy systems at customer facilities that have both "gateway" and
>   "storage-node" machines; the gateways connect to the rest of customer
>   site and the storage-nodes, and the storage-nodes connect only to one
>   another and the gateways.  We'd like the gateways to sync to either a
>   customer-supplied NTP server or external NTP servers, and the other
>   gateways, and the storage-nodes to sync to the gateways (and trust
>   them completely).
> 
> The setup;
> 
>   The gateways have 2 or 3 interfaces, one of which goes to the internal
>   LAN, and the other one or two go to private back-end switches.  The
>   ntp.conf on the gateways looks like this;
> 
>     driftfile /var/lib/ntp/ntp.drift
>     statsdir /var/log/ntpstats/
>     statistics loopstats peerstats clockstats
>     filegen loopstats file loopstats type day enable
>     filegen peerstats file peerstats type day enable
>     filegen clockstats file clockstats type day enable
> 
>     server <one or more servers, external or internal>
>     server <one or more other gateways, using the back-end addresses>
> 
>   The storage-nodes have 2 interfaces, each of which goes to back-end
>   switches.  The ntp.conf on the storage-nodes looks like this;
> 
>     driftfile /var/lib/ntp/ntp.drift
>     statsdir /var/log/ntpstats/
>     statistics loopstats peerstats clockstats
>     filegen loopstats file loopstats type day enable
>     filegen peerstats file peerstats type day enable
>     filegen clockstats file clockstats type day enable
> 
>     server <two or more gateways, using the back-end addresses>
> 
>   furthermore, /etc/init.d/ntp has been modified on the storage-nodes to
>   include the -g flag.
> 
> The problem;
> 
>   It doesn't seem to work reliably;
> 
>     gateway1:~# date;for i in 2 51 52 53 54; do ssh -1 10.123.123.$i date;done
>     Fri Oct  7 11:10:36 EDT 2005
>     Fri Oct  7 11:10:35 EDT 2005
>     Fri Oct  7 11:18:27 EDT 2005
>     Fri Oct  7 11:14:05 EDT 2005
>     Fri Oct  7 11:08:26 EDT 2005
>     Fri Oct  7 11:22:15 EDT 2005
> 
>   I'm unsure what's going on, or how to diagnose.  It looks like
>   everything is communicating properly;
> 
>     On the gateway (time 11:10 above);
> 
>  gateway1:~# ntpq -c pe localhost
>       remote           refid      st t when poll reach   delay   offset  jitter
>  ==============================================================================
>  <internal NTP server> <>          2 u   38   64  377    0.278  -1565.0   4.599
>  <other gateway>       <>          4 u  928 1024  377    0.114  -1135.6   2.005
>  <other gateway>       <>         16 u  805 1024    0    0.000    0.000 4000.00
> 
>     On one of the storage-nodes (time 11:14 above);
> 
>  storage-node2:~# ntpq -c pe localhost
>       remote           refid      st t when poll reach   delay   offset  jitter
>  ==============================================================================
>  <gateway 1>           <>          3 u   40   64  377    0.136  -207752   3.355
>  <gateway 2>           <>          4 u   35   64  377    0.123  -208891   4.606
> 
>   Looking at the debugging techniques, and seeing that the tally code is
>   a space, and delving deeper, I see;
> 
>     gateway1:~# ntpq -c as localhost
>     ind assID status  conf reach auth condition  last_event cnt
>     ===========================================================
>     1 47900  9014   yes   yes  none    reject   reachable  1
>     2 47901  9014   yes   yes  none    reject   reachable  1
>     3 47902  8000   yes   yes  none    reject
> 
>     storage-node2:~# ntpq -c as localhost
>     ind assID status  conf reach auth condition  last_event cnt
>     ===========================================================
>     1 16076  9064   yes   yes  none    reject   reachable  6
>     2 16077  9064   yes   yes  none    reject   reachable  6
> 
>   So obviously the responses are getting rejected, but I'm not clear
>   why.  Looking at what should theoretically be the upstream internal
>   NTP server from the gateway;
> 
> gateway1:~# ntpq -c 'rv 47900' localhost
> status=9014 reach, conf, 1 event, event_reach,
> srcadr=sfprinters.us.babcockbrown.com, srcport=123, dstadr=10.16.4.150,
> dstport=123, leap=00, stratum=2, precision=-7, rootdelay=0.000,
> rootdispersion=9733.109, refid=bigbird.babcockbrown.com, reach=377,
> unreach=0, hmode=3, pmode=4, hpoll=6, ppoll=6, flash=00 ok, keyid=0,
> offset=-1562.833, delay=0.230, dispersion=11.665, jitter=7.127,
> reftime=c6f0cea2.4526e978  Fri, Oct  7 2005  6:38:26.270,
> org=c6f11314.3e76c8b4  Fri, Oct  7 2005 11:30:28.244,
> rec=c6f11315.d120d130  Fri, Oct  7 2005 11:30:29.816,
> xmt=c6f11315.d10cc35c  Fri, Oct  7 2005 11:30:29.816,
> filtdelay=     0.31    0.31    0.36    0.30    0.23    0.39    0.27    0.37,
> filtoffset= -1572.7 -1562.2 -1567.0 -1572.8 -1562.8 -1567.7 -1573.4 -1563.1,
> filtdisp=      7.83    8.82    9.78   10.74   11.68   12.64   13.60   14.58
> 
>   And looking at one of the gateways from the storage-node;
> 
> storage-node2:~# ntpq -c 'rv 16076' localhost
> status=9064 reach, conf, 6 events, event_reach,
> srcadr=10.123.123.1, srcport=123, dstadr=10.123.123.52, dstport=123,
> leap=00, stratum=3, precision=-16, rootdelay=0.244,
> rootdispersion=9734.558, refid=10.16.4.1, reach=377, unreach=0, hmode=3,
> pmode=4, hpoll=6, ppoll=6, flash=00 ok, keyid=0, offset=-207768.629,
> delay=0.118, dispersion=3.779, jitter=3.407,
> reftime=c6e82e68.10c63f14  Fri, Sep 30 2005 17:36:40.065,
> org=c6f11375.feae6c8f  Fri, Oct  7 2005 11:32:05.994,
> rec=c6f11445.c40d2c38  Fri, Oct  7 2005 11:35:33.765,
> xmt=c6f11445.c3fec13b  Fri, Oct  7 2005 11:35:33.765,
> filtdelay=     0.19    0.14    0.12    0.14    0.14    0.15    0.14    0.14,
> filtoffset= -207770 -207769 -207768 -207767 -207766 -207765 -207763 -207762,
> filtdisp=      0.03    1.01    1.97    2.91    3.90    4.88    5.85    6.83
> 
>   I've also seen a flash value of 80.  Now it appears to be 00.
> 
> I'm at a loss here.  What we want is for the gateways to get their times
> from either upstream external NTP sources, or internal sources, or to
> just accept their own time, and the storage-nodes should get their times from
> the gateways and believe them no matter what the skew.
> 
> How can I go about figuring out where to go from here?  Thanks in
> advance for any help..
> 
> 
> --
> Ted Beatie                         Permabit, Inc.             ted at permabit.com
> Sr. Systems Engineer       One Kendall Sq, Cambridge, MA       +1-617-995-9317
> 
> _______________________________________________
> questions mailing list
> questions at lists.ntp.isc.org
> https://lists.ntp.isc.org/mailman/listinfo/questions
> 




More information about the questions mailing list