[ntp:questions] what happens when sys.peer turns stratum 16?

unruh unruh at wormhole.physics.ubc.ca
Tue Jun 1 16:49:59 UTC 2010

On 2010-06-01, Kalle Pokki <kalle.pokki at iki.fi> wrote:
> On Tue, Jun 1, 2010 at 00:10, unruh <unruh at wormhole.physics.ubc.ca> wrote:
>>> If we assume there is a private subnet that has two GPS reference
>>> clocks to synchronize the rest of the machines, what would be the
>>> expected failure mode where one of the stratum 1 servers go crazy, and
>>> having three GPS clocks actually makes a difference?
>> The gps falls off the roof and is burried in shrub, but still uses its
>> internal clock to deliver PPS pulses is an example.
> That of course could happen, but the scenario requires some really
> convenient failure in the GPS unit. If the GPS lock is lost, at least
> the GPS units I have tested will indicate their clock is freewheeling.
> The time is then discarded by gpsd as invalid and will not be fed to
> the NTP reference clock driver.

The nmea time reference may. The PPS will keep freewheeling. AFAIK, gpsd
does not invalidate the PPS when the sattelites disappear.

> There are countless other possible failure scenarios, each of them
> more or less fatal to the application. The Ethernet interface in some
> computer in the network could start jamming the whole subnet by

The question is not whether the system jams up but whetner you can have
a wrong time delivered.

> constantly broadcasting something. A more probable failure I have
> witnessed a couple of times is a cheap Ethernet switch starting to
> corrupt frames randomly or flapping its links fast, but this usually
> only causes long random delays, not undetectable bad data.
>>From my experience, hardware or well designed and tested software
> going crazy (i.e. outputting completely invalid data) without any
> safeguards noticing it usually requires quite bizarre double failures
> in the system at the same time. Thinking this I sometimes wonder the
> reasoning for the high numbers of servers suggested here. It's often
> three or four, but some seem to suggest even five or seven servers so
> that quite a lot of them can fail. It may be wise with internet
> servers and it also doesn't bring any costs adding more pool servers,
> but with private subnets with own reference clocks it seems like an
> overkill for at least many applications.
> That said, I have seen an embedded computer whose internal clock
> somehow run twice as fast as it should have. It wasn't used with NTP,
> but I'd be curious to know how NTP would have handled the situation,
> especially if that computer would have been attached to a reference
> clock.

ntp cannot handle that at all. ntpd has a 500PPM max rate correction,
and double time is far far greater than 500PPM. 

More information about the questions mailing list