[time] What is happening here?
Tue Aug 7 22:09:01 UTC 2007
Yes you are right indeed, I had misunderstood want you meant, sorry for my confusion about that.
I was talking about the problem caused by dns caching, as round-robin does not happen for a cached result set of 14 IP's.
A cached dns result will always return the 14 set of IP's in the same exact order until TTL expires and I believe the first IP in the list is the most unlucky one, and it becomes even worse with a TTL of 2700 sec.
So why not modify bind source code to directly parse a big list of IP's and return a different group of 14 or so IP's in each query, and also start using a lower TTL?
Or maybe setting 'pool.ntp.org' with a round-robin list of CNAME's with very low TTL instead of A records, pointing to '0.pool.ntp.org', '1.pool.ntp.org' and so on, and these ones having a higher TTL list of 14 IP's, this would spare bandwith in dns traffic as 'N.pool.ntp.org' list of 14 IP's would remain cached with the high TTL and the authoritative dns servers would mostly handle and repond to queries returning the CNAME's instead of 14 IP's?
I also agree with you in that these spikes from TT customers are not the fault of just TT, as I have told in another thread.
It is my opinion however that it is evident that responsibles at TT are configuring their customers with ntp servers using the pool instead of using their own ntp servers which I believe would be a much more ethical behaviour (and technically more adequate I think) for such large ISP, and so I believe that part of this specific problem still falls on TT.
----- Original Message -----
From: "Rob Janssen" <rob at knoware.nl>
To: "Rui Ferreira" <ruiferreira at iesfafe.pt>
Cc: <timekeepers at fortytwo.ch>
Sent: Tuesday, August 07, 2007 9:18 PM
Subject: Re: [time] What is happening here?
Rui Ferreira wrote:
> I believe you are somewhat wrong, as the dns servers actually make round-robin on a per request basis.
> If you try "dig pool.ntp.org. @a.ntpns.org" several repeated times you will see the round-robin working on a per request basis, that is, you will see the returned ip's rotating on each request.
> The problem that you are talking about is related to the TTL of the results, 2700 sec at this moment, that is, the result will remain in dns cache for 2700 sec.
No. The problem is that the DNS returns 14 addresses from the pool for
each domain name within the pool (e.g. pool.ntp.org,
europe.pool.ntp.org, nl.pool.ntp.org) even when that part of the pool
has many more than 14 servers. The set of 14 servers remains the same
for one hour, only the sequence within this set of 14 rotates.
So, when there are 500 servers in the pool and a large group of users
tries to get time using simple NTP (a single request to retrieve the
current time), all the requests from that large group of users go to
only 14 out of the 500 servers.
The servers in that group of 14 see a "spike", and the remaining 486
servers have nothing to complain about.
An hour later, 14 different servers see a "spike".
That is why I claim this spike is not caused by T?rk Telecom but by our
DNS system. When the DNS would really rotate over all 500 servers, the
load would be distributed over 500 instead of 14 servers and the spike
would be 35 times lower.
Of course there is the problem that DNS typically uses caching servers
and so you cannot rotate as fast as you would like.
timekeepers mailing list
timekeepers at fortytwo.ch
More information about the pool