[ntp:hackers] Blast attack at USNO

David Mills mills at udel.edu
Thu Apr 8 03:34:28 UTC 2010


Guys,

Rich Schmidt of USNO reports at least three recent blast attacks on one 
or another of his servers. Today the attack lasted for one hour when the 
aggregate offered traffic went from 100 pkt/s to 3500 pkt/s. I'd like to 
pick your brains on what to do about this. NTP packets are about 1000 
bits, so the blast consumed some 3.5 Mb/s of network traffic.

With modern processors a offered traffic level of 3500 pkt/s is probably 
well below redline, so the attack today probably didn't hurt the server 
itself, but it probably crushed some upstream ISP or DSL line with the 
traffic coming back. Therefore, every attempt should be to find the 
elephants, shoot them at all costs don't let the packets come back.

Ordinarily, the rate controls would find the elephants and curb their 
enthusiasm. This could include or not include KoD packets, which are 
themselves rate restricted to 2 pkt/s aggregated over all elephants. 
This should work especially well if the elephants were really fast, as 
they would concentrate at the near end of the MRU list, but Rich reports 
the MRU maximum lifetime is only 40 ms. In the NIST case reported in the 
PTTI paper a few years age the lifetime was more like 9 s, so it was 
very effective.

Assuming the MRU list is that short, overflows would be very common. The 
algorithm that deals with MRU overflow uses a fixed probabilistic 
threshold. When a packet arrives that overflows the MRU list, a random 
roll is compared against this threshold. If the roil is higher, the 
arriving packet is dropped; if lower, the first packet in the MRU list 
is discarded and replaced by the new arrival. The result is that entries 
on the MRU list migrate toward the end more slowly, giving a greater 
chance that an elephant entry can be stepped on again. This scheme might 
be further improved by using a second threshold lower than the first, bu 
invoked only when the entry to be discarded has a repetition count 
greater than zero.

Right now there is only one define to enable both Autokey and the 
OpenSSL symmetric key routines. That should be changed so that the 
Autokey segments require both OPENSSL and a new define AUTOKEY.

Rich also reports a strange lockup apparently due to overload of the 
server, presumably by an elephant. Both he and Judah Levine at NIST have 
removed thousands of lines of code apparently in a misguided attempt to 
reduce the code length. Several years ago I compute the redline for an 
old Alpha at about 5000 pkt/s. With modern machines it should be two or 
three times that, even with the monitoring code enabled. Note that the 
only path in question is in ntp_proto.c where the receive routine 
dispatched by the I/O system uses not more than 60 lines of C before 
calling the fast_xmit() routine, assuming the packet is not 
authenticated. The OpenSSL symmetric key routines are quite fast; even 
with Autokey, in which the packet must be specially marked, only two 
calls to the symmetric key routines are required.

I conclude that, if anything has changed that considerably reduced the 
redline, it must be somewhere in the I/O system.

By the way, I computed the redline statistic by letting the server run 
for awhile and using the elapsed process time and number of packets to 
compute a saturation statistic and then divided that by two.

Dave


More information about the hackers mailing list