[ntp:questions] Re: Netra X1 time warp

David L. Mills mills at udel.edu
Sun Jan 25 18:46:04 UTC 2004


Roman,

Reset. I thought you were in US Mountain Time, which is seven hours
west, but you and most of Europe cheat one hour east. This reduces the
time between the last peerstats recorded and the detected timewarp. It
does not explain why reference clock polls were not recorded, which
indeed might be due to torque of the Unix timer system. There was a
recent change that fixed a bug when the time was torqued a matter of
years. The fix was to reset the timer system when the time was stepped.
But, in this case the daemon never realized the time was torqued, so did
not reset the timers system.

My problem is that the daemon never realized the time was stepped 100
years and never complained and exited as result. If for some reason it
torqued the time, the evidence would have to be in the peerstats, but it
isn't. I have to conclude that something else torqued the time and left
the timer system in an awkward state.

The stepout threshold is 900s, which means your stratum-2 client polling
at 1024 s would notice the warp within one poll interval. If it took 35
minutes from the last good report to the first bad one, this further
narrows the insult.

We know the last message logged for the radio was 0249Z, but the last
peerstats was 0257Z and it showed nothing wrong. No doubt the peerstats
was produced by the radio driver as result of a timer interrupt that
polled the clock, so we know as of 0257Z the time and timer system was
still intact. However, at 0258Z another error message was produced, but
no peerstats and no evidence of timer interrupts after that. The first
notice the warp occured was 0332Z. The error messages apparently have to
do with the timecode format, not the NTP time or time conversion, so the
last error message indicated the clock reading was discarded.

See the problem? Apparently, the timer system was running okay at the
time of the last error message when the clock reading was discarded and
before recording peerstats or even getting close to setting the clock,
but the timer system wasn't heard from after that. It seems likely that
ntpd had nothing to do with the warp.

Dave

Roman Maeder wrote:
> 
> David L. Mills wrote:
> 
> > The system log shows no ntpd exit message. Once started, the only normal
> > ntpd exit is accompanied by a message similar to that displayed by the
> > second machine, and in that case the clock is not adjusted. Something
> > stopped ntpd before it ever got to that point. The peerstats are
> > recorded before the clock discipline routine is called. If the delta was
> > found by ntpd, it would show up in the peerstats. I conclude ntpd either
> > crashed and did not set the clock or was stopped by somebody else.
> 
> I did some investigation before I rebooted the machine. ntpd was still
> running, it served out time to anyone asking and it reacted to ntpdc and ntpq.
> "ntpq -p" showed nothing obviously wrong; it was as if it was running fine; is
> it possible that the clock discipline routine waited for a timer signal, that
> normally arrives every few seconds, but now would take 100years to be delivered?
> 
> The code that handles the DCF clock also kept running. I completely forgot
> about that log file. It shows the occasional "parse" error messages when a
> spike upsets the DCF77 receiver, but there is also something more interesting
> there:
> 
> 22 Jan 03:49:15 ntpd[347]: synchronized to GENERIC(0), stratum=0
> 12 Jun 23:33:12 ntpd[347]: parse: convert_rawdcf: INCOMPLETE DATA - time code
> only has 29 bits
> 
> 12 Jun 23:33:12 ntpd[347]: PARSE receiver #0: interval for following error
> message class is at least 00:01:00
> 12 Jun 23:33:12 ntpd[347]: PARSE receiver #0: FAILED TIMECODE:
> "------------------M-S---81-4" (check receiver configuration / cableling)
> 22 Jan 04:58:01 ntpd[347]: parse: convert_rawdcf: INCOMPLETE DATA - time code
> only has 33 bits
> 
> The first msg was before the problems started, it is in MET, then there are
> two with June dates (which year?) then the first of those dated January 22,
> 1904 (MEST).
> So I would conclude that the hiccup happened between 03:49:15 and 03:58:01, or
> 02:49 and 02:58 UTC.
> 
> > Geeze I wish folks would keep the logs in UTC. The last peerstats was
> > recorded at 0257 UTC on 53026 MJD, (22 January 2004 Gregorian). However,
> > the second machine noticed the warp at 0432 MET or 1132 UTC. So, what
> > was going on during the eight hours between 0257 and 1132? Did somebody
> > kill ntpd near 0257 and then warp the clock near 1132?
> 
> 0432 MET is 0332 UTC, so this is only a 35 minute gap. That seems to be the
> time ntpd allows to conclude that its server does indeed serve out wrong time
> consistently and that it's time to call it quits.
> 
> Roman Maeder



More information about the questions mailing list