[ntp:questions] NTP vs RADclock?
unruh at invalid.ca
Sat Jun 9 06:02:54 UTC 2012
On 2012-06-08, Rick Jones <rick.jones2 at hp.com> wrote:
> unruh <unruh at invalid.ca> wrote:
>> I am sure that it started when we switched from 100Mb technology to
>> Gb technology, yes. Other places to look for the problem would be
> I would suggest then trying disabling of the interrupt coalescing via
> ethtool on the 1GbE NIC of your server and a few select clients and
> see what that does. If things start to look cleaner then you know it
> is an implementation-specific detail of one or more GbE NICs.
It looks to me that interrupt coalescing is not enables according to
It seems that it is the receipt of the packets is the problem.
Ie, if I plot the round trip time vs the offset, it is strongly
correlated so that the longer the roundtrip, the more the offset
indicates that the local clock is behind time. (by 1/2 the roundtripi
Ie, it is a one way delay, and the effect is much worse for the Gigabit
than for the 100 (ie, the variation in round trip is about 4 times as
large for Gigabit than for the 100.)
> If it is possible to connect a client "back-to-back" to your server at
> the same time (via a second port) - still with interrupt coalescing
> disabled at both ends that would be an excellent addition. That will
> help evaluate the switch.
> I trust there were no OS changes when going from 100BT to GbE? Though
> even if not, there is still the prospect of the drivers for the 100BT
> cards not doing what linux calls "napi" and the drivers for the GbE
> cards doing it, which may introduce some timing changes.
What is napi?
>> So yes, I think it is the Gb technology that is causing trouble.
> I split what may seem a hair between Gb technology being the IEEE
> specification and Gb implementation being what specific NIC vendors
> do. So, to me, interrupt coalescing is implementation not technology.
For me, I do not care what which it is, it is all Gb.
Note that on one of the clients, there are two separate clusters of
roundtrip delays, one from .15 to about .4ms, and the other from about
1.3 to 1.6 ms. The slope within each cluster is as above but the slope
between the clusters is the opposite. Ie, within the cluster, the client
to server is being delayed, while the clusters are due to a huge delay
in the server to client. (if I have the signs right)
I have the scatter plots (offset vs return time)
for two clients to two different servers. One of the servers is a Gb
server, while the other is a 100Mb server. Both servers are disciplined
by a GPS PPS device. The offset fluctuations on both servers is about 4
us, so none of the offset fluctuations come from the server clocks
More information about the questions