[ntp:questions] Testing throughput in NTP servers

Ulf Samuelsson ulf at invalid.com
Mon Sep 17 09:21:33 UTC 2012

On 2012-09-14 21:44, Terje Mathisen wrote:
> Ulf Samuelsson wrote:
>> On 2012-09-12 21:24, Richard B. Gilbert wrote:
>>> On 9/12/2012 2:34 PM, unruh wrote:
>>>> On 2012-09-12, Ulf Samuelsson <ulf at invalid.com> wrote:
>>>>> Anyone knows if there are any available Linux based S/W to test the
>>>>> throughput of NTP servers?
>>>>> I.E:
>>>>>     packets per second?
>>>>>     % of lost packets
>>>>>     etc?
>>>>> Best Regards
>>>>> Ulf Samuelsson
>>>> I hope not. I can just see someone deciding to test one of the
>>>> stratum 1
>>>> main servers (eg at the usno) Why in the world would you want this?
>>> Sigh!  I'm sure it has happened and will happen again!  I'm sure that
>>> there are people complaining to the National Bureau of Standards or
>>> the Naval Observatory that their time is incorrect! ;-)
>>> If you really want time with better than micro-second accuracy, consider
>>> get a GPS Timing receiver. The one I bought several years
>>> ago claimed 50 nanosecond plus or minus of the  correct time.
>> The NTP server we will be testing will be connected to a Cesium clock
>> providing a 1pps pulse so that is really not my problem.
>> I want to check if this system can handle DDoS attacks, and bad packets.
>> This will be done in a lab environment, possibly point-to-point from
>> the test machine, to the server, or maybe
>> In order to test DDoS, probably some FPGA H/W is needed to generate good
>> packets, and the S/W stuff is there to generate bad packets and
>> check how the server reacts to those
> Do you really need that?
> It seems to me that by modifying an ethernet card driver to do ntp
> processing in kernel mode, you should be able to handle at least the
> same number of ntp requests as you can do ping replies.

One of the key requirements of the Ethernet card is to do timestamping 
of the incoming packets. There are FPGA solutions with 10 GbE capable
of this today.

Part of the requirement, is that the dependencies on the underlying O/S
should be minimized.  If an FPGA can handle everything, then this is ideal.
Currently the FPGA will split incoming packets into streams with one
stream per thread.

> Way back when, around 1992, Drew Major managed t o get a NetWare 386
> server to handle a read request in 300 clock cycles. This was from
> receipt of the packet and included parsing, access control checks,
> locating the requested data somewhere in the memory cache, constructing
> the response packet and handing it back to the NIC.
> Assuming we can get the actual ntp standard request code processing down
> to the absolute minimum (read the RDTSC counter (or a similar
> low-latency clock source) and the latest OS tick value/RDTSC count,
> scale the offset count by a fixed factor, then add to the OS clock
> value) we should be able to get the entire processing down to ~100 clock
> cycles or so. I.e. moving packet data in/out of the NIC buffers is going
> to take comparable time.

The first FPGA H/W has some limitations, in that the reading the 
timestamp counter from the CPU is really not recommended, since you kill
the PCIe performance.
Instead the ntp code adds a delay to the incoming packet timestamp,
and the FPGA H/W sends out the packet at the correct time.

> (Any other kind of request is handled as today, i.e. queued for ntpd
> processing, unless DDOS level packet rates cause the queue to pass some
> very low limit in size, at which point we discard the requests.)
> Any packet which fails some minimum sanity checks can be discarded
> quickly, this is less overhead than handling it over to the regular
> user-level ntpd process.

How do you test that this works?
Any specific S/W package that you developed?

>> Recording the packets will be done with FPGA H/W as well.
> So a network sniffer won't be fast enough?

The FPGA card is a network sniffer as well so there is ready made S/W 
for this.

> You're talking 10 GiGE wire speed, right?


> That's more than 100 M requests/second!

Line speed is 10M+ packets/second.

I have been told that a single compromised home router can generate 
about 3000 packets per second on a 100 Mbps network.
With 3-4000 such routers you reach 10 GbE linespeed.
My local service provider is  now offering 1 Gb Internet access at home,
(if I care for some throughput),
so with some decent H/W, there could be more.
This solution is supposed to have some lifetime.

Probably some intelligence in front of the NTP server
which removes nasty packets would be useful as well.

> Taking a pessimistic view (1K clock cycles/request) would give just 3M
> packets/core/second, so a 32-core (4x8) machine would suffice.

> Getting closer to my 100-cycle target (for chained processing of a bunch
> of consecutive request packets) drops the cpu requirements down to a
> regular quad core single cpu machine, but at this point the bus probably
> won't be able to keep up with the NIC.
> Terje
Ulf Samuelsson

More information about the questions mailing list