Tue Jan 3 12:03:26 UTC 2006

Hello Martin,

 On Monday, January 2, 2006 at 21:12:16 +0100, Martin Burnicki wrote:

> Serge Bets wrote:
>> But all went wrong after the leap. The service entered a cycle of
>> wildly bad frequency, like +50 or -120 instead of -20.8 PPM, with
>> offset quickly wandering, then time reset.
> Could you please also have a look at the event logs of those machines
> and report if there have been any remarkable messages?

I made a mistake: The bad frequency began nearly 2 hours before the
leap, at 22:11 UTC. I was confused by the first "time reset -0.149742 s"
that happened only later at 0:18 UTC, followed by many others.

There are many clockhops, between other S2 peers and S1 server of the
LAN, and also lost reachability (probably good servers seen as insane),
and some minutes later resync to the same server. Those clockhops follow
each reset. The LAN was isolated, and all other peers/servers of the LAN
leaped OK and had stable frequency and sub-ms offset, *after the leap*.
There was not a single bad source.

But what happened 2 hours before is much less clear. I was finishing the
scripts to monitor the event, and especially had done a pair of dirty
experiments on one other S2 machine. Suspect.

Another notable fact: The bad frequency was around +50 (+-10) all the
(very long) time between 22:11 and 6:00, despite the wander, despite the
resets, and when the normal drift of this machine is extremely stable
-20.8 PPM (+-0.3 of thermal effect). Only after 6:00 freq jumped to
-120, then later to +450, and such. Since killed drift file and service
restarted, the freq is back normal:

| $ ntpq -c "rv 0 frequency" Oniko
| status=06f4 leap_none, sync_ntp, 15 events, event_peer/strat_chg
| frequency=-20.756

Hint for David: Look at 4th column of loopstats file for the frequency
correction recalculated at each poll.

Serge point Bets arobase laposte point net

