[ntp:questions] Bizzare half second disagreement between ntp hosts

Luis Colorado luis.colorado at hispalinux.es
Fri Jan 11 13:28:54 UTC 2008

"Unruh" <unruh-spam at physics.ubc.ca> escribió en el mensaje news:aVChj.26825$fj2.1600 at edtnps82...
> "Luis Colorado" <luis.colorado at hispalinux.es> writes:
>>Just suppose you are downloading some great file what creates asymetries =
>>in the data flow bandwidth.  When that happens, you observe discrepances =
>>with external time sources in the form of sistematic offsets to  the =
>>clocks outside.  When you lose a local time source, your server gets the =
>>time reference from the couple of best stratum clocks disponible and can =
>>skip even more than a second.
>>If you need reliability and precission, you can mount two or more =
>>servers with gps timesources and connect them over ethernet media.  =
>>You'll have better than 1ms on the whole local net if you use PPS =
> That is the machine that has the pps ( from a GPS 18LVM) signal. that gps
> dropped for some reason, and then there seemed to be a discrepancy of about
> half a sec between tick.usask.ca, and the other three level 2 or 3 ntp
> sources. I realise that ntp would assume that the majority rules, but the
> majority was wrong ( as seen on all of the other systems who got their time
> from that system by chrony. They all suddenly saw a half sec jump in their
> time-- ie they suddenly found themselves with a .48 sec offset).
> So I did everything I thought I could to get reliability and precision, and
> instead got a half second error.

not, suppose you have an assymetric roundtrip to tick.usask.ca, due to a long downloading being done at your site. If the downloading is at your site, you'll have the same assymetric roundtrip to all the remote servers configured in your ntpd (and the same conditions apply for the three servers you post).

you'll have long delays in the frames that come to you, but short in the frames you send to your time servers (the same for the three ones).  NTP cannot assume how much time is wasted in the comming path and how much in the going path, so it assumes a typicall case of 50% of waste in either direction in the calculus of the offsets.

The result is (read the NTP Reques For Comments document for an explanation) that you measure false offsets from these servers (and actually the ***same*** false offsets).

You must consider the absolute error in this case (you are indeed in a worst case, as all errors add without compensation) which is the root distance plus the root dispersion, and you will see that it is in the order of your measured offset.

> Note that tick.usask.ca typically has a .2 ms offset, with a 40ms

you say typically, but what happens in the case you posted?

The main reason of using a local source of time is getting better that half a second on internet. .5s. is a typicall situation on a loaded line over internet.

> roundtrip.
> The others were pool sources.

Another source of systematic errors is the erroneous supposition that there is no delay in the interval that you get the PPS interrupt and the timestamp obtained in the kernel to get the clock offset.  But this affects in the order of microseconds, not milliseconds.  Consider that if you are interfacing a TTL PPS signal over a TTL to RS232 levels conversor, you are lossing several microseconds in the conversor gates.  You have another systematic delay in the interval that goes between the PPS interrupt and the time the kernel makes a timestamp of the event.  This offset can be variable if the interrupt is not of high priority or the CPU caches instructions in a high speed memory (suppose a worst case of a kernel that pages interrupt code to disk, which produces a page fault when the PPS arrives, not allowing the kernel to timestamp that event but to when the page is loaded from disk to memory)   This is not the case on actual operating systems, but consider that normally PPS signals get feed to the kernel over RS232 lines (slow lines at low priority interrupts) 

What version of ntpd are you using?
What kind of PPS are you using?
How are you interfacing PPS signal to the kernel? 
Are you actually interfacing the PPS signal to the kernel?
Is your kernel adecuatelly configured to use the PPS signal?
What Operating System are you using?
What version?

>>"Unruh" <unruh-spam at physics.ubc.ca> escribi=F3 en el mensaje =
>>news:FMfej.53259$5l3.36002 at edtnps82...
>>>I have a very weird situation. I am running a GPS PPS (Garmin GPS18LVM)
>>> with a few machines as a backup/initialization.=20
>>> Sudeenly for about half and hour, my GPS failed for some reason ( =
>>still do
>>> not know what was wrong since it had come back on air by the time I =
>>> something wrong). Every hour I run a ntpq -p just to check that my gps =
>>> on air. I got this report.
>>>     remote           refid      st t when poll reach   delay   offset =
>>> =
>>=3D =3D=3D
>>> xtick.usask.ca   .GPS.            1 u 1003 1024  377   44.954    0.213 =
>>> +sanrail.com      2 u  993 1024  377    1.486  -479.03 =
>>> +raptor.tera-byt    2 u  322 1024  377   17.295  -480.35 =
>>> *zeus.yocum.org    2 u  390 1024  377   70.415  -481.02 =
>>> SHM(0)          .PPS.            0 l 1415   16    0    0.000   -0.002 =
>>> Now I believe the tick.usask.ca result, since all of the machines =
>>which use
>>> mine as a source suddenly noticed a .48 second jump when my GPS =
>>failed. But
>>> why in the world would three systems all suddenly be out by .48 sec?=20
>>> Doing a peers on them, one has a GPS as its source, one a .WWVB. and =
>>one an
>>> .ACTS. Why should all three suddenly be out by half a second?

More information about the questions mailing list