[ntp:questions] NTP clients not syncing up to servers?

Danny Mayer mayer at ntp.isc.org
Tue Oct 11 21:24:41 UTC 2005


Ted Beatie wrote:
>>>   server <one or more servers, external or internal>
>>>   server <one or more other gateways, using the back-end addresses>
>>>
>>Add iburst to the end of each server line. This speeds up synchronization.
> 
> 
> To all of the server lines, or just the internal-to-our-system servers?
> 

All.

> 
>>>   server <two or more gateways, using the back-end addresses>
>>>
>>Three servers are an absolute minimum because 2 means it has no way of 
>>knowing which is providing better information. Let's leave aside the 
>>question of the meaning of the word "better", it's a very complicated 
>>subject.
> 
> 
> As I mentioned to Tom, what if we can't guarantee that?  As near as I
> can tell, whereas more is better, the only actual requirement is for one
> server.  In some cases, we're lucky if we get even one, so we either
> need to believe that one, or we need to set the time manually.
> 

Should I assume that you have no control over these systems? Whose 
requirement?

> 
>>Based on the above the internal NTP server has a stratum of 2 and will 
>>almost always be used over a stratum of 4. Is that internal NTP server 
>>getting its data from a stratum 1 server and is it internal or external?
> 
> 
> It is internal, and looks like it gets it's time from other internal machines;
> 
> portal-01:~# ntptrace -n
> 127.0.0.1: stratum 3, offset 0.000006, synch distance 15.20248
> 10.16.4.1: stratum 2, offset -2.558634, synch distance 1.00000
> 10.16.4.100: stratum 2, offset -2.571121, synch distance 1.00000
> 10.16.100.2: stratum 2, offset -2.520537, synch distance 0.04373
> 132.163.4.101:  *Timeout*
> 
That means that 10.16.100.2 is not actually getting time from anywhere 
and is currently isolated. It can't reach the Boulder timeserver.

> 
>>By obfuscating the addresses it's hard to know if you've also removed 
>>the Tally Codes which indicates what gateway1 thinks of the servers. 
>>Since you are using the private address space for this it really doesn't 
>>matter if they're seen. If you don't want to show the names, just add a 
>>-n and it won't translate the IP addresses.
> 
> 
> As I mentioned in the post, the tally codes were spaces.
> 
> portal-01:~# ntpq -nc pe localhost
>      remote           refid      st t when poll reach   delay   offset  jitter
> ==============================================================================
>  10.16.4.1       10.16.4.100      2 u   40   64  377    0.280  -2558.0   4.447
>  10.123.123.2    10.123.123.1     4 u  810 1024  377    0.172  -1849.0   2.014
>  10.123.123.3    0.0.0.0         16 u  679 1024    0    0.000    0.000 4000.00
> 

That means that it's not synchronized and it hasn't even decided on how 
valid or invalid each of them are.

> 
>>This only has two servers and you need at least 3. As it is gateway1 and 
>>gateway2 are at two different stratum levels. However you need to fix 
>>the problem first on the gateways.
> 
> 
> Despite the spec, that seems to be a consistent interpretation.  If
> everything internal is fully meshed, and there is only one external time
> source, will everything sync up to that external source, no matter the skew?
> 

At best you will get the single source time, but that's not guaranteed.

> 
>>> Looking at the debugging techniques, and seeing that the tally code is
>>> a space, and delving deeper, I see;
>>>
>>>   gateway1:~# ntpq -c as localhost
>>>   ind assID status  conf reach auth condition  last_event cnt
>>>   ===========================================================
>>>   1 47900  9014   yes   yes  none    reject   reachable  1
>>>   2 47901  9014   yes   yes  none    reject   reachable  1
>>>   3 47902  8000   yes   yes  none    reject
>>>
>>>   storage-node2:~# ntpq -c as localhost
>>>   ind assID status  conf reach auth condition  last_event cnt
>>>   ===========================================================
>>>   1 16076  9064   yes   yes  none    reject   reachable  6
>>>   2 16077  9064   yes   yes  none    reject   reachable  6
>>>
>>
>>Usually you will see these kinds of results when the server you are 
>>looking at has just started. You really need to give it time to synchronize.
> 
> 
> Not in this case;
> 
> portal-01:~# ps aux|grep ntp;for i in 2 51 52 53 54; do ssh -1
> 10.123.123.$i ps aux; done | grep ntp
> root   11283  0.0  0.1  2328 2320 ?        SL   Sep30   0:05 /usr/sbin/ntpd
> root   17856  0.0  0.1  2328 2320 ?        SL   Sep30   0:04 /usr/sbin/ntpd
> root     382  0.0  0.1  2328 2320 ?        SL   Jun13   0:04 /usr/sbin/ntpd -g
> root     382  0.0  0.1  2328 2320 ?        SL   Jun13   0:04 /usr/sbin/ntpd -g
> root     383  0.0  0.1  2328 2320 ?        SL   Jun13   0:04 /usr/sbin/ntpd -g
> root     389  0.0  0.1  2328 2320 ?        SL   Jun13   0:05 /usr/sbin/ntpd -g
> 

This is hard to understand since you can't tell which system is which.

> (the Sep30 processes are on the two gateways, the Jun13 processes are on
> the servers.  I had recently manually stopped ntpd, resync'd the times,
> and restarted ntpd on the gateways)
> 
> 
>>This appears to indicate it received just one packet which is not enough 
>>to synchronize anything. How long did you wait for the server after it 
>>was started to interrogate this server? You need to wait at least 15-20 
>>minutes when you don't use iburst.
> 
> 
> How long would it take with iburst set?  How can we deal with the fact
> that the gateways and servers all generally come up at the same time?
> 

Usually with iburst it can be as fast as 15 seconds but it depends on 
lots of factors. I don't think this is your issue here.

Danny

> 	    --ted
> 
> --
> Ted Beatie                         Permabit, Inc.             ted at permabit.com
> Sr. Systems Engineer       One Kendall Sq, Cambridge, MA       +1-617-995-9317
> 
> _______________________________________________
> questions mailing list
> questions at lists.ntp.isc.org
> https://lists.ntp.isc.org/mailman/listinfo/questions
> 




More information about the questions mailing list