[ntp:questions] NTP servers redundancy

Rob nomail at example.com
Sun Jan 31 10:16:56 UTC 2010


Danny Mayer <mayer at ntp.org> wrote:
> Ryan Malayter wrote:
>> On Tue, Jan 19, 2010 at 3:57 PM, Rob <nomail at example.com> wrote:
>>> Compare it with a RAID-1 disk system. Â When one disk has an unreadable
>>> sector, the situation is clear: use the sector from the other disk.
>>> When both disks are readable but return different data, you cannot know
>>> which one is correct.
>>>
>>> This normally is solved by not checking for that condition, rather than
>>> to use 3 disks and a majority vote (which still could disagree between
>>> all 3 disks).
>> 
>> Disks use error correcting codes (usually some layered Reed-Solomon
>> scheme) at the physical layer to detect errors. Disks rarely, if ever,
>> return *incorrect* data. They return known-good data or 'Read failed".
>> 
>
> Right. With disks the data stored on it should be the same for all
> mirrored disks. If they are not you have a hardware or software problem
> with the code that reads and writes to the disks.

The problem is that the poster started with the assumption that an NTP
clock could be broken and could possibly return the wrong time even
though it indicates that it is synced.

He then explains that IF this happens THEN you have a problem when you
don't have 25 servers in your list.

But my reasoning is that there are always going to be cases where you
have a problem, no matter how many countermeasures you take.  The disk
is an example of this.  The disk should return good data or "read failed",
but what if it doesn't???

My experience shows that when you try to counter-act any posisble failure
mode you can think about, you end up with a complicated system that will
fail in another way than you envisioned, often due to some inadvertent
side-effect of the added complication.




More information about the questions mailing list