[ntp:questions] Large offset (>200 seconds) during system boot

Kevin Oberman oberman at es.net
Mon Nov 23 19:27:16 UTC 2009


> Date: Mon, 23 Nov 2009 14:02:52 +0100
> From: "Nottorf, Stefan" <Stefan.Nottorf at plath.de>
> Sender: questions-bounces+oberman=es.net at lists.ntp.org
> 
> Hello,
> I encountered a somehow strange problem with ntpd after system boots.
> We powered down a whole HP-Enclosure with HP-Blades, all running ntp
> 4.2.2p1 at 1.1570 (packaged with RedHat Enterprise Linux Server 5.1 64
> bit). All of these Blades use ntp.confs like the following :
> 
> driftfile /var/lib/ntp/drift
> server blade0 iburst version 4
> peer blade1
> peer blade2
> peer blade3
> peer blade4
> peer blade5
> peer blade6
> 
> The blades were running fine before the shut down, with 6 ms offset as
> worst synchronization (measured against a Meinberg GPS 167 Radio Clock).
> After completely shutting down the enclosure and rebooting the blades
> showed an offset of 211000 ms (+- 1500 ms). ntpq -p showed that all
> sources had (more ore less) the same offset.
> After a restart of the ntp demon (via /etc/init.d/ntpd restart) -
> without changing anything in the config file - everything worked fine
> again...
> This also happens if I shut down only a single blade.
> 1) Has anybody else encountered this behaviour?
> 2) Has anybody found a workaround (especially with this version of ntp)?
> 3) Is there a known problem, if many ntpdemons (~16) start at approx.
> the same time?
> 4) Is there perhaps some well hidden "synchronization mechanism" by HP
> that sets the time in advance (in this case to a time 211 s off...) ?
> 
> I might also wait for the release of ntp 4.2.6 if this would fix this
> behaviour, but my customer is quite sensitive about being an "early
> adopter" (regardless of how well tested the software is). 
> My current "workaround" (if it can be called so) is restarting the ntpd
> after the system has finished booting.
> Best regards,
> Stefan Nottorf

I don't believe that this has anything to do with NTP, but is the
hardware. I suspect that the HP bladeserver has a single hardware clock
that is used to set the time for any blade at boot time. Unfortunately,
this time is off by about 211 seconds and the system calls that would
normally update the hardware clock are not doing so. This is not
unreasonable when a single HW clock is used for multiple systems as one
system could mess up the time for all of them if it mis-set the time.

I am puzzled by one thing...where is the "real" time coming from? If
all systems are off by 211 seconds, ntp is getting the "correct" time
from somewhere, but I don't know where. If you have no external time
source, I am confused. You mention a Meinberg, but I don't see any
indication of how it fits in.
-- 
R. Kevin Oberman, Network Engineer
Energy Sciences Network (ESnet)
Ernest O. Lawrence Berkeley National Laboratory (Berkeley Lab)
E-mail: oberman at es.net			Phone: +1 510 486-8634
Key fingerprint:059B 2DDF 031C 9BA3 14A4  EADA 927D EBB3 987B 3751



More information about the questions mailing list