[ntp:questions] Proposed NTP solution for a network
unruh-spam at physics.ubc.ca
Fri Mar 6 05:44:24 UTC 2009
Jason <bmwjason at bmwlt.com> writes:
>A sys admin and I discussed the possible solutions today, and we have a
>potential winner, although we can't reach the small 10s of uSec goal, we
>can reach several 100s of uSec (and probably less) easily. We are going
>to make a proposal to mgmt to increase the number of S1 servers, the
>number of GPS receivers, and I'm looking for an Rb clock source for the
>main datacenter as well. We are also re-engaging the blade / enclosure
It looks, from the situation we have here at my university, that it will
also depend on the kind of switches you have. We replaced our 10/100
switches with Gbit switches, and the behaviour of ntp decayed significantly
( by factors greater than 2). Ie, whereas before I was getting 10s of usec
as the "accuracy" of my clocks, it is not creaping toward 100usec. We have
tried to figure out what the problem is, but have not succeeded. These Gbit
switches seem to have, or trigger in the ethernet cards, additional and
assymetric delays of the order of 100-200usec. (assymentric delays are of
course the worst, since they directly affect the errors in ntp). Now If you
run a gps PPS to each machine you could do far better, that seems not to be
an option for you. (I run the PPS on the parallel port, but your blades are
unlikely to have those-- in fact they are getting rarer even on standalone
Ie, test your switches as well.
You can see some of the results on www.theory.physics.ubc.ca/chrony/chrony.html.
Note that I use chrony rather than ntp, but they are similar ( my tests indicate
that chrony is about a factor of 2 better than ntp in the situations I have tested,
primariy because of its far greater responsiveness to changes like temperature
changes, but the difference is probably not relevant to you.)
Ie, there might be conflict between good timing and good bandwidth.
>We will also be working on custom monitoring software that will provide
>a very early alert if one of the S0 sources or S1 servers drifts outside
>some bounds. We are also going to start retrieving the various NTP stats
>files to our statistics database(s) so that the software guys can
>incorporate that into the alerting and health monitoring application(s).
>Thanks to all for your assistance, I'll post back how things get along.
More information about the questions