[ntp:questions] NTP clients not syncing up to servers?

Ted Beatie ted at permabit.com
Tue Oct 11 21:07:27 UTC 2005


> >    server <one or more servers, external or internal>
> >    server <one or more other gateways, using the back-end addresses>
> >
> 
> Add iburst to the end of each server line. This speeds up synchronization.

To all of the server lines, or just the internal-to-our-system servers?

> >    server <two or more gateways, using the back-end addresses>
> >
> Three servers are an absolute minimum because 2 means it has no way of 
> knowing which is providing better information. Let's leave aside the 
> question of the meaning of the word "better", it's a very complicated 
> subject.

As I mentioned to Tom, what if we can't guarantee that?  As near as I
can tell, whereas more is better, the only actual requirement is for one
server.  In some cases, we're lucky if we get even one, so we either
need to believe that one, or we need to set the time manually.

> Based on the above the internal NTP server has a stratum of 2 and will 
> almost always be used over a stratum of 4. Is that internal NTP server 
> getting its data from a stratum 1 server and is it internal or external?

It is internal, and looks like it gets it's time from other internal machines;

portal-01:~# ntptrace -n
127.0.0.1: stratum 3, offset 0.000006, synch distance 15.20248
10.16.4.1: stratum 2, offset -2.558634, synch distance 1.00000
10.16.4.100: stratum 2, offset -2.571121, synch distance 1.00000
10.16.100.2: stratum 2, offset -2.520537, synch distance 0.04373
132.163.4.101:  *Timeout*

> By obfuscating the addresses it's hard to know if you've also removed 
> the Tally Codes which indicates what gateway1 thinks of the servers. 
> Since you are using the private address space for this it really doesn't 
> matter if they're seen. If you don't want to show the names, just add a 
> -n and it won't translate the IP addresses.

As I mentioned in the post, the tally codes were spaces.

portal-01:~# ntpq -nc pe localhost
     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
 10.16.4.1       10.16.4.100      2 u   40   64  377    0.280  -2558.0   4.447
 10.123.123.2    10.123.123.1     4 u  810 1024  377    0.172  -1849.0   2.014
 10.123.123.3    0.0.0.0         16 u  679 1024    0    0.000    0.000 4000.00

> This only has two servers and you need at least 3. As it is gateway1 and 
> gateway2 are at two different stratum levels. However you need to fix 
> the problem first on the gateways.

Despite the spec, that seems to be a consistent interpretation.  If
everything internal is fully meshed, and there is only one external time
source, will everything sync up to that external source, no matter the skew?

> >  Looking at the debugging techniques, and seeing that the tally code is
> >  a space, and delving deeper, I see;
> >
> >    gateway1:~# ntpq -c as localhost
> >    ind assID status  conf reach auth condition  last_event cnt
> >    ===========================================================
> >    1 47900  9014   yes   yes  none    reject   reachable  1
> >    2 47901  9014   yes   yes  none    reject   reachable  1
> >    3 47902  8000   yes   yes  none    reject
> >
> >    storage-node2:~# ntpq -c as localhost
> >    ind assID status  conf reach auth condition  last_event cnt
> >    ===========================================================
> >    1 16076  9064   yes   yes  none    reject   reachable  6
> >    2 16077  9064   yes   yes  none    reject   reachable  6
> >
> 
> Usually you will see these kinds of results when the server you are 
> looking at has just started. You really need to give it time to synchronize.

Not in this case;

portal-01:~# ps aux|grep ntp;for i in 2 51 52 53 54; do ssh -1
10.123.123.$i ps aux; done | grep ntp
root   11283  0.0  0.1  2328 2320 ?        SL   Sep30   0:05 /usr/sbin/ntpd
root   17856  0.0  0.1  2328 2320 ?        SL   Sep30   0:04 /usr/sbin/ntpd
root     382  0.0  0.1  2328 2320 ?        SL   Jun13   0:04 /usr/sbin/ntpd -g
root     382  0.0  0.1  2328 2320 ?        SL   Jun13   0:04 /usr/sbin/ntpd -g
root     383  0.0  0.1  2328 2320 ?        SL   Jun13   0:04 /usr/sbin/ntpd -g
root     389  0.0  0.1  2328 2320 ?        SL   Jun13   0:05 /usr/sbin/ntpd -g

(the Sep30 processes are on the two gateways, the Jun13 processes are on
the servers.  I had recently manually stopped ntpd, resync'd the times,
and restarted ntpd on the gateways)

> This appears to indicate it received just one packet which is not enough 
> to synchronize anything. How long did you wait for the server after it 
> was started to interrogate this server? You need to wait at least 15-20 
> minutes when you don't use iburst.

How long would it take with iburst set?  How can we deal with the fact
that the gateways and servers all generally come up at the same time?

	    --ted

--
Ted Beatie                         Permabit, Inc.             ted at permabit.com
Sr. Systems Engineer       One Kendall Sq, Cambridge, MA       +1-617-995-9317




More information about the questions mailing list