[ntp:questions] Testing throughput in NTP servers

Terje Mathisen "terje.mathisen at tmsw.no" at ntp.org
Mon Sep 17 12:58:16 UTC 2012

Ulf Samuelsson wrote:
> On 2012-09-14 21:44, Terje Mathisen wrote:
>> Do you really need that?
>> It seems to me that by modifying an ethernet card driver to do ntp
>> processing in kernel mode, you should be able to handle at least the
>> same number of ntp requests as you can do ping replies.
> One of the key requirements of the Ethernet card is to do timestamping
> of the incoming packets. There are FPGA solutions with 10 GbE capable
> of this today.

> Part of the requirement, is that the dependencies on the underlying O/S
> should be minimized.  If an FPGA can handle everything, then this is ideal.
> Currently the FPGA will split incoming packets into streams with one
> stream per thread.

Why can't your FPGA do the entire NTP processing, for all regular 
request packets?

Grab the current timestamp, fill it into the T1 & T2 fields, along with 
leap indicator etc., then return the packet.

> The first FPGA H/W has some limitations, in that the reading the
> timestamp counter from the CPU is really not recommended, since you kill
> the PCIe performance.

If the FPGA can do everything, then it (obviously) require a board-local 
clock source as well...

> Instead the ntp code adds a delay to the incoming packet timestamp,
> and the FPGA H/W sends out the packet at the correct time.

OK, this still means that the host CPU must be involved in every packet.
>> (Any other kind of request is handled as today, i.e. queued for ntpd
>> processing, unless DDOS level packet rates cause the queue to pass some
>> very low limit in size, at which point we discard the requests.)
>> Any packet which fails some minimum sanity checks can be discarded
>> quickly, this is less overhead than handling it over to the regular
>> user-level ntpd process.
> How do you test that this works?
> Any specific S/W package that you developed?

I haven't done this for ntp (yet), but I have been involved with 
high-performance network code since around 1986, and I wrote my own file 
transfer sw that did large frames/sliding window/selective retransmit 
back in 1984.

>>> Recording the packets will be done with FPGA H/W as well.
>> So a network sniffer won't be fast enough?
> The FPGA card is a network sniffer as well so there is ready made S/W
> for this.
>> You're talking 10 GiGE wire speed, right?
> Yes.
>> That's more than 100 M requests/second!
> Line speed is 10M+ packets/second.

That should be easy. (Famous last words!)

With a 1000 cycles/packet processing budget, 3 cores of a 3GHz quad core 
cpu would be more or less sufficient.
> I have been told that a single compromised home router can generate
> about 3000 packets per second on a 100 Mbps network.
> With 3-4000 such routers you reach 10 GbE linespeed.
> My local service provider is  now offering 1 Gb Internet access at home,
> (if I care for some throughput),
> so with some decent H/W, there could be more.
> This solution is supposed to have some lifetime.
> Probably some intelligence in front of the NTP server
> which removes nasty packets would be useful as well.

If that intelligence takes more processing cycles than an actual ntp 
request/reply response, then I'd be willing to settle for a very simple 
rate limiter.

I.e. if the same (possibly faked!) source address is generating more 
than a packet per second or so, send a KOD reply, then stop responding.

Of course, handling/filtering 10M packets/second _could_ require a 10M 
entry hash table of source addresses, at which point FPGA hw will get 
into memory access problems, right?

I would like to look into locking N-1 of the cores into a busy loop, 
polling for new packets and processing them as soon as they arrive.

Since this avoid the IRQ overhead it should be possible to at least get 
very close to the actual bus transfer rate, and with close to fixed time 
delay from line receipt until the cpu gets access.

For outgoing wire speed packets it is a bit harder, since you must send 
streams of packets, and the actual delay will depend upon the current 
buffer/queue level.

Estimating the actual outgoing time by checking the queue size should 
give a pretty good guesstimate, we are talking about sub 1000 bit 
packets, so each ntp packet takes less than 100 ns.

- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

More information about the questions mailing list