[ntp:questions] NTP clients not syncing up to servers?
Danny Mayer
mayer at ntp.isc.org
Tue Oct 11 21:24:41 UTC 2005
Ted Beatie wrote:
>>> server <one or more servers, external or internal>
>>> server <one or more other gateways, using the back-end addresses>
>>>
>>Add iburst to the end of each server line. This speeds up synchronization.
>
>
> To all of the server lines, or just the internal-to-our-system servers?
>
All.
>
>>> server <two or more gateways, using the back-end addresses>
>>>
>>Three servers are an absolute minimum because 2 means it has no way of
>>knowing which is providing better information. Let's leave aside the
>>question of the meaning of the word "better", it's a very complicated
>>subject.
>
>
> As I mentioned to Tom, what if we can't guarantee that? As near as I
> can tell, whereas more is better, the only actual requirement is for one
> server. In some cases, we're lucky if we get even one, so we either
> need to believe that one, or we need to set the time manually.
>
Should I assume that you have no control over these systems? Whose
requirement?
>
>>Based on the above the internal NTP server has a stratum of 2 and will
>>almost always be used over a stratum of 4. Is that internal NTP server
>>getting its data from a stratum 1 server and is it internal or external?
>
>
> It is internal, and looks like it gets it's time from other internal machines;
>
> portal-01:~# ntptrace -n
> 127.0.0.1: stratum 3, offset 0.000006, synch distance 15.20248
> 10.16.4.1: stratum 2, offset -2.558634, synch distance 1.00000
> 10.16.4.100: stratum 2, offset -2.571121, synch distance 1.00000
> 10.16.100.2: stratum 2, offset -2.520537, synch distance 0.04373
> 132.163.4.101: *Timeout*
>
That means that 10.16.100.2 is not actually getting time from anywhere
and is currently isolated. It can't reach the Boulder timeserver.
>
>>By obfuscating the addresses it's hard to know if you've also removed
>>the Tally Codes which indicates what gateway1 thinks of the servers.
>>Since you are using the private address space for this it really doesn't
>>matter if they're seen. If you don't want to show the names, just add a
>>-n and it won't translate the IP addresses.
>
>
> As I mentioned in the post, the tally codes were spaces.
>
> portal-01:~# ntpq -nc pe localhost
> remote refid st t when poll reach delay offset jitter
> ==============================================================================
> 10.16.4.1 10.16.4.100 2 u 40 64 377 0.280 -2558.0 4.447
> 10.123.123.2 10.123.123.1 4 u 810 1024 377 0.172 -1849.0 2.014
> 10.123.123.3 0.0.0.0 16 u 679 1024 0 0.000 0.000 4000.00
>
That means that it's not synchronized and it hasn't even decided on how
valid or invalid each of them are.
>
>>This only has two servers and you need at least 3. As it is gateway1 and
>>gateway2 are at two different stratum levels. However you need to fix
>>the problem first on the gateways.
>
>
> Despite the spec, that seems to be a consistent interpretation. If
> everything internal is fully meshed, and there is only one external time
> source, will everything sync up to that external source, no matter the skew?
>
At best you will get the single source time, but that's not guaranteed.
>
>>> Looking at the debugging techniques, and seeing that the tally code is
>>> a space, and delving deeper, I see;
>>>
>>> gateway1:~# ntpq -c as localhost
>>> ind assID status conf reach auth condition last_event cnt
>>> ===========================================================
>>> 1 47900 9014 yes yes none reject reachable 1
>>> 2 47901 9014 yes yes none reject reachable 1
>>> 3 47902 8000 yes yes none reject
>>>
>>> storage-node2:~# ntpq -c as localhost
>>> ind assID status conf reach auth condition last_event cnt
>>> ===========================================================
>>> 1 16076 9064 yes yes none reject reachable 6
>>> 2 16077 9064 yes yes none reject reachable 6
>>>
>>
>>Usually you will see these kinds of results when the server you are
>>looking at has just started. You really need to give it time to synchronize.
>
>
> Not in this case;
>
> portal-01:~# ps aux|grep ntp;for i in 2 51 52 53 54; do ssh -1
> 10.123.123.$i ps aux; done | grep ntp
> root 11283 0.0 0.1 2328 2320 ? SL Sep30 0:05 /usr/sbin/ntpd
> root 17856 0.0 0.1 2328 2320 ? SL Sep30 0:04 /usr/sbin/ntpd
> root 382 0.0 0.1 2328 2320 ? SL Jun13 0:04 /usr/sbin/ntpd -g
> root 382 0.0 0.1 2328 2320 ? SL Jun13 0:04 /usr/sbin/ntpd -g
> root 383 0.0 0.1 2328 2320 ? SL Jun13 0:04 /usr/sbin/ntpd -g
> root 389 0.0 0.1 2328 2320 ? SL Jun13 0:05 /usr/sbin/ntpd -g
>
This is hard to understand since you can't tell which system is which.
> (the Sep30 processes are on the two gateways, the Jun13 processes are on
> the servers. I had recently manually stopped ntpd, resync'd the times,
> and restarted ntpd on the gateways)
>
>
>>This appears to indicate it received just one packet which is not enough
>>to synchronize anything. How long did you wait for the server after it
>>was started to interrogate this server? You need to wait at least 15-20
>>minutes when you don't use iburst.
>
>
> How long would it take with iburst set? How can we deal with the fact
> that the gateways and servers all generally come up at the same time?
>
Usually with iburst it can be as fast as 15 seconds but it depends on
lots of factors. I don't think this is your issue here.
Danny
> --ted
>
> --
> Ted Beatie Permabit, Inc. ted at permabit.com
> Sr. Systems Engineer One Kendall Sq, Cambridge, MA +1-617-995-9317
>
> _______________________________________________
> questions mailing list
> questions at lists.ntp.isc.org
> https://lists.ntp.isc.org/mailman/listinfo/questions
>
More information about the questions
mailing list