[ntp:questions] NTP vs RADclock?

unruh unruh at invalid.ca
Sun Jun 10 23:18:37 UTC 2012

On 2012-06-10, Rick Jones <rick.jones2 at hp.com> wrote:
> unruh <unruh at invalid.ca> wrote:
>> On 2012-06-08, Rick Jones <rick.jones2 at hp.com> wrote:
>> > I would suggest then trying disabling of the interrupt coalescing
>> > via ethtool on the 1GbE NIC of your server and a few select
>> > clients and see what that does.  If things start to look cleaner
>> > then you know it is an implementation-specific detail of one or
>> > more GbE NICs.
>> It looks to me that interrupt coalescing is not enables according to
>> ethtools.
> I'd like to see the full output of ethtool, ethtool -i and ethtool -c
> for your interfaces if I may.  Feel free to send as direct email if
> you prefer.

info:10.0[unruh]>ethtool -i eth0                   
driver: e1000                                      
version: 7.3.21-k8-NAPI                            
firmware-version: N/A                              
bus-info: 0000:06:00.0                             
info:10.0[unruh]>ethtool -c eth0                   
Coalesce parameters for eth0:                      
Adaptive RX: off  TX: off                          
stats-block-usecs: 0
sample-interval: 0
pkt-rate-low: 0
pkt-rate-high: 0

rx-usecs: 3
rx-frames: 0
rx-usecs-irq: 0
rx-frames-irq: 0

tx-usecs: 0
tx-frames: 0
tx-usecs-irq: 0
tx-frames-irq: 0

rx-usecs-low: 0
rx-frame-low: 0
tx-usecs-low: 0
tx-frame-low: 0

rx-usecs-high: 0
rx-frame-high: 0
tx-usecs-high: 0
tx-frame-high: 0

>> > If it is possible to connect a client "back-to-back" to your server at
>> > the same time (via a second port) - still with interrupt coalescing
>> > disabled at both ends that would be an excellent addition.  That will
>> > help evaluate the switch.
>> >
>> > I trust there were no OS changes when going from 100BT to GbE?  Though
>> > even if not, there is still the prospect of the drivers for the 100BT
>> > cards not doing what linux calls "napi" and the drivers for the GbE
>> > cards doing it, which may introduce some timing changes.
>> What is napi?
> Napi is a mechanism whereby interrupts on a NIC get disabled, and
> packets are polled for for a certain length of time.
> http://www.linuxfoundation.org/collaborate/workgroups/networking/napi
> http://en.wikipedia.org/wiki/New_API
>> >> So yes, I think it is the Gb technology that is causing trouble. 
>> >
>> > I split what may seem a hair between Gb technology being the IEEE
>> > specification and Gb implementation being what specific NIC vendors
>> > do.  So, to me, interrupt coalescing is implementation not technology.  
>> For me, I do not care what which it is, it is all Gb. 
> I suspect that my caring about Gb technology/specification vs Gb
> implementation may be not all that far from a timekeeper's desire to
> distinguish between accuracy and precision, even when laypeople start
> to mix the two :)
>> Note that on one of the clients, there are two separate clusters of
>> roundtrip delays, one from .15 to about .4ms, and the other from
>> about 1.3 to 1.6 ms. The slope within each cluster is as above but
>> the slope between the clusters is the opposite. Ie, within the
>> cluster, the client to server is being delayed, while the clusters
>> are due to a huge delay in the server to client. (if I have the
>> signs right)
>> In http://www.theory.physics.ubc.ca/scatter/scatter.html I have the
>> scatter plots (offset vs return time) for two clients to two
>> different servers. One of the servers is a Gb server, while the
>> other is a 100Mb server. Both servers are disciplined by a GPS PPS
>> device. The offset fluctuations on both servers is about 4 us, so
>> none of the offset fluctuations come from the server clocks
>> themselves.
> It would be good to include the specific card name and driver rev etc
> in subsequent writeups.  Over the years there have been several Intel
> gigabit cards and 100BT cards.  I believe just about all the Intel GbE
> cards have had support for interrupt coalescing in some form or
> another.  At least those which have crossed my path.
> rick jones
> lspci -v can help if you don't already know the card name(s)

On the misbehaving machine
Intel Corporation 82541PI Gigabit Ethernet Controller (rev 05)

The fact that the distribution in round trip times is almost a perfect
square pulse (Ie, constant probability between the minimum 1.4us to the
max .4us) suggests that may it is polling rather than interrupt, altough
the card certainly has an interrupt 

>From dmesg

[   14.333930] e1000: Intel(R) PRO/1000 Network Driver - version
[   14.333936] e1000: Copyright (c) 1999-2006 Intel Corporation.
[   14.334031] e1000 0000:06:00.0: PCI INT A -> GSI 21 (level, low) ->
IRQ 21
[   14.766662] e1000 0000:06:00.0: eth0: (PCI:33MHz:32-bit)
[   14.766675] e1000 0000:06:00.0: eth0: Intel(R) PRO/1000 Network
[   68.420253] e1000: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow
Control: RX
[   68.812100] e1000: eth0 NIC Link is Down
[   79.713724] e1000: eth0 NIC Link is Up 100 Mbps Full Duplex, Flow
Control: RX

(I have no idea what that change to 100Mbps means. I am checking that
the switch has not been configured to force 100Mbs on this port. I still
do not see how this could explain the problem, but I will check.)

The ethernet controller on the first client in the scatterplots is 
Intel Corporation 82562EZ 10/100 Ethernet Controller (rev 01)

The controller on the second one ( the one with the two clusters) is
Intel Corporation 82557/8/9 Ethernet Pro 100 (rev 08)

More information about the questions mailing list