[ntp:questions] NTP errors are caused by single and multi-bit memory errors (was: Re: questions Digest, Vol 66, Issue 7)

Chuck Swiger cswiger at mac.com
Mon Apr 12 17:25:24 UTC 2010


On Apr 12, 2010, at 9:25 AM, David J Taylor wrote:
>> I want to know about non ECC memory based PC's running NTP and how many
>> NTP errors are caused by single and multi-bit memory errors.
>> 
> 
> I am not aware of any.  I rather imagine that memory errors in such a system would likely cause the system to halt, rather than affecting just NTP.

According to wikipedia, single-bit error rates seem to happen reasonably infrequently:

"Recent tests give widely varying error rates with over 7 orders of magnitude difference, ranging from 10**-10 to 10**-17 error/bit·h, roughly one bit error, per hour, per gigabyte of memory to one bit error, per century, per gigabyte of memory."

...and the chance of a 1-bit error happening to ntpd in particular, rather than to any part of the system, is much lower.

Considering that ntpd tends to run about 0.5MB - 1.5MB of resident code, compared with anywhere from 100 - 600 MB wired for the kernel (depending on whether I look at a machine with 512MB of RAM, or 3 GB of RAM), and on the order of 100MB for my mail client or other commonly used apps.  In other words, only ~0.1% of memory errors are going to happen to the RAM used for ntpd.  Mostly, such errors go un-noticed since they might well happen in bytes not being used for anything (padding in structures, memory in the middle of big char or whatever arrays that aren't being used right now, etc).

Unless the system implements at least parity checking, there's nothing in the hardware which is going to cause the system to halt.  Parity or ECC machines have the option of throwing an exception (called MCE for the Intel world) if they encounter un-correctable memory errors.

Regards,
-- 
-Chuck




More information about the questions mailing list