[ntp:hackers] Blast attack at USNO
David Mills
mills at udel.edu
Thu Apr 8 03:34:28 UTC 2010
Guys,
Rich Schmidt of USNO reports at least three recent blast attacks on one
or another of his servers. Today the attack lasted for one hour when the
aggregate offered traffic went from 100 pkt/s to 3500 pkt/s. I'd like to
pick your brains on what to do about this. NTP packets are about 1000
bits, so the blast consumed some 3.5 Mb/s of network traffic.
With modern processors a offered traffic level of 3500 pkt/s is probably
well below redline, so the attack today probably didn't hurt the server
itself, but it probably crushed some upstream ISP or DSL line with the
traffic coming back. Therefore, every attempt should be to find the
elephants, shoot them at all costs don't let the packets come back.
Ordinarily, the rate controls would find the elephants and curb their
enthusiasm. This could include or not include KoD packets, which are
themselves rate restricted to 2 pkt/s aggregated over all elephants.
This should work especially well if the elephants were really fast, as
they would concentrate at the near end of the MRU list, but Rich reports
the MRU maximum lifetime is only 40 ms. In the NIST case reported in the
PTTI paper a few years age the lifetime was more like 9 s, so it was
very effective.
Assuming the MRU list is that short, overflows would be very common. The
algorithm that deals with MRU overflow uses a fixed probabilistic
threshold. When a packet arrives that overflows the MRU list, a random
roll is compared against this threshold. If the roil is higher, the
arriving packet is dropped; if lower, the first packet in the MRU list
is discarded and replaced by the new arrival. The result is that entries
on the MRU list migrate toward the end more slowly, giving a greater
chance that an elephant entry can be stepped on again. This scheme might
be further improved by using a second threshold lower than the first, bu
invoked only when the entry to be discarded has a repetition count
greater than zero.
Right now there is only one define to enable both Autokey and the
OpenSSL symmetric key routines. That should be changed so that the
Autokey segments require both OPENSSL and a new define AUTOKEY.
Rich also reports a strange lockup apparently due to overload of the
server, presumably by an elephant. Both he and Judah Levine at NIST have
removed thousands of lines of code apparently in a misguided attempt to
reduce the code length. Several years ago I compute the redline for an
old Alpha at about 5000 pkt/s. With modern machines it should be two or
three times that, even with the monitoring code enabled. Note that the
only path in question is in ntp_proto.c where the receive routine
dispatched by the I/O system uses not more than 60 lines of C before
calling the fast_xmit() routine, assuming the packet is not
authenticated. The OpenSSL symmetric key routines are quite fast; even
with Autokey, in which the packet must be specially marked, only two
calls to the symmetric key routines are required.
I conclude that, if anything has changed that considerably reduced the
redline, it must be somewhere in the I/O system.
By the way, I computed the redline statistic by letting the server run
for awhile and using the elapsed process time and number of packets to
compute a saturation statistic and then divided that by two.
Dave
More information about the hackers
mailing list