[ntp:hackers] Improved monlist scheme

David L. Mills mills at UDel.Edu
Mon Sep 15 07:49:31 PDT 2003


Guys,

Long message. Intended mostly for the diehard detail fanatic. 

The monlist function available with NTPv4 ntpd and ntpdc has proved a
valuable tool to investigate deviant usage patterns, especially
accidental or purposeful flooding attacks found at NIST, USNO and U
Wisconsin. Unfortunately, the flux of rascals has been way too much for
the monlist scheme to capture. At NIST, for example, the 600-entry list
fills up in 7 s and at rackety.udel.edu it fills up in 200 s. At least
with the latter it was hoped to capture usage patterns to at least 1024
s to evaluate the effectiveness of the poll-adjust algorithm.

A simplistic analysis assumes all clients poll at 1024-s intervals. If
there are 200 packets/s arriving at NIST, we conclude there are 204,800
distinct addresses pounding on time.nist.org, more of course is some
clients are more thirsty than others. If the 600-entry monlist list
fills up in 7 s, the list represents only a small fraction of this
population and will catch only those who drink more often than 7 s. 

The monlist list sorts distinct source addresses in order of increasing
age since the most recent reference, in other words a classic
least-recently-used (LRU) queue. Upon each reference an existing
duplicate entry is removed and placed first in the list. Upon first
occurence, a new entry is inserted first in the list. Mathematically,
this is a classic stack algorithm. In the original implementation, if
the list fills up, the oldest entry is removed and the new entry
inserted first in the list. This of course violates the stack
assumption.

The problem of course is that, if there are too many different source
addresses found in too short a time, the oldest entry can be removed
long before a new duplicate is received. The behavior is similar to the
classic thrashing behavior seen in early virtual memory operating
systems which I had initimate experience with.

I tried a number of deteriministic and stochasitc algorithms to deal
with this problem until experiencing a brainfizz recalling Peter
Denning's Working Set Principle used with effect in modern virtual
memory systems. The idea is to estimate for each job the minimum number
of pages to keep in memory and avoid running the job unless that many
empty pages are in fact available. Implementations I am familiar with
use a table-driven scheduler designed to separate the elephants from the
mice and run only the elephants that fit for multiple timeslices to
avoid paging overhead.

I felt a full implementation of this would be something like a ten-ton
flyswatter and that a probabilistic approach would be sufficient. So,
the scheme that evolved operates like this. If the LRU list has not yet
filled up, do the LRU thing as before. If the list has filled up roll
the dice on the unit interval and compare with a discard threshold.
Below the threshold continue as before and clobber the oldest entry;
otherwise, simply discard the new entry. Come to think of it, this might
be an interesting scheme to try on a real operating system. Grad
students, you mayline up now.

The trick is how to set the discard threshold without intricate
per-instance analysis. There is now a new state variable in ntpd called
the monitor discard parameter set by the "discard monitor <integer>"
command, with current default 3000. The discard threshold is computed as
the ratio of the current oldest entry age to the monitor discard
parameter. In order to more effectively evaluate the scheme, the ntpdc
monlist format was changed slightly so that the last column now
represents the age of each entry, in seconds.

I tried this scheme on two primary (stratum 1) servers, public rackety
and private pogo. Pogo serves as control; it currently receives 2.9
packets/s, but never fills up the LRU list. Rackety currently receives
12.9 packets/s and, with a discard parameter at 3000 keeps about 600 s
of history in the LRU list. If indeed most of these rascals were
operating at 1024 s, this represents some 13,200 spectators. The new
scheme captures the most frisky 600 of them operating at poll intervals
less than 600 s. This is more than enough to spot nasty pollsters and
old NTP implementations that do not effectively elevate the poll
interval.

I was curious about the good guys operating at 1024 s with rackety, so
increased the discard parameter to 7000, just enough to capture the good
guys, and something peculiar happened. As expected, there were far more
of these than could fit in the 600-entry LRU list, so classic page
thrashing was observed. A bunch of 1024-s pollsters was captured and
started to get old. When the bunch reached the age of 1024 s, the next
few pollsters tossed the bunch off the island and the phenom repeated.

This has been a most interesting and revealing exercise. With this
scheme it is possible to automatically detect abusers and report to the
log for possible blacklisting. Along with the recent call-gap scheme and
kiss-o'-death packet, it represents another in the arsenal of defensive
schemes designed to protect widely distributed, ubiquitous network
services.

With the Backroom machines temporarily out of service, I managed to park
a tarball ntp-4.1.80-rc1a.tar.gz on ftp.udel.edu/pub/ntp/software. It
would be interesting to see how it performs in other very busy servers.

Dave



More information about the hackers mailing list