[ntp:questions] what happens when sys.peer turns stratum 16?

Kalle Pokki kalle.pokki at iki.fi
Tue Jun 1 06:59:18 UTC 2010


On Tue, Jun 1, 2010 at 00:10, unruh <unruh at wormhole.physics.ubc.ca> wrote:

>> If we assume there is a private subnet that has two GPS reference
>> clocks to synchronize the rest of the machines, what would be the
>> expected failure mode where one of the stratum 1 servers go crazy, and
>> having three GPS clocks actually makes a difference?
>
> The gps falls off the roof and is burried in shrub, but still uses its
> internal clock to deliver PPS pulses is an example.

That of course could happen, but the scenario requires some really
convenient failure in the GPS unit. If the GPS lock is lost, at least
the GPS units I have tested will indicate their clock is freewheeling.
The time is then discarded by gpsd as invalid and will not be fed to
the NTP reference clock driver.

There are countless other possible failure scenarios, each of them
more or less fatal to the application. The Ethernet interface in some
computer in the network could start jamming the whole subnet by
constantly broadcasting something. A more probable failure I have
witnessed a couple of times is a cheap Ethernet switch starting to
corrupt frames randomly or flapping its links fast, but this usually
only causes long random delays, not undetectable bad data.

>From my experience, hardware or well designed and tested software
going crazy (i.e. outputting completely invalid data) without any
safeguards noticing it usually requires quite bizarre double failures
in the system at the same time. Thinking this I sometimes wonder the
reasoning for the high numbers of servers suggested here. It's often
three or four, but some seem to suggest even five or seven servers so
that quite a lot of them can fail. It may be wise with internet
servers and it also doesn't bring any costs adding more pool servers,
but with private subnets with own reference clocks it seems like an
overkill for at least many applications.

That said, I have seen an embedded computer whose internal clock
somehow run twice as fast as it should have. It wasn't used with NTP,
but I'd be curious to know how NTP would have handled the situation,
especially if that computer would have been attached to a reference
clock.




More information about the questions mailing list