[ntp:hackers] Re: GPS weirdness with ntp-dev-4.2.0a-200503xx
David L. Mills
mills at udel.edu
Fri Apr 8 20:40:10 PDT 2005
Mark,
Check out pogo.udel.edu, a Solaris connected to a Spectracom GPS and
PPS. I checked several days in the archive from October 2004 to now and
found no clockhops other than a few occasions when I was running special
tests. On some occasions I purposely turned down mindist below 10 ms and
did confirm little wiggles with amplitude of a few tens of microseconds
could cause clockhop of one or the other GPS or PPS.
There are two ways one or the other GPS or PPS can win or lose. If the
correctness intervals for the two sources do not overlap, no majority
clique can result and both sources will show a tally code of space. The
other way is one or the other root distance falls over the maximum
distance (maxdist) threshold, normally one second. If this happens one
of GPS or PPS will be selected and the other one show space. The normal
case will be for GPS to show + and the PPS to show o.
Notice in the clock selection algorithm the mitigation code near the
end. For that to work properly, both the GPS and PPS must survive the
intersection and clustering algorithms. Since the default minimum
cluster survivors is three, your two are guaranteed to get through. The
only other possible leak are the intersection algorithm and fitness
check (peer_unfit). I suggest checking the dispersion on the GPS driver
to make sure that never gets close to one second; it should only be 10
ms or so.
There is another possibility where both GPS and PPS survive, but the
clustering algorithm has ordeded GPS ahead of PPS. The selection metric
is dominated by stratum and secondarily by root distance. But, the
stratum of both the GPS and PPS is the same - zero, so the only other
decision data is the root distance. There could well be a clockhop if
the root distance, which includes dispersion and jiter, wobbles back and
fourth, as it naturally does. The result is to the clustering algorithm
the two sources look very close to each other and sometimes cannot be
distinguished.
What the algorithms are telling you in such cases is that both sources
appear - to the algorithm - of comparable quality and the algorithm
can't tell them apart. In other words - to the algorithm - it doesn't
matter which source is used; they are equivalent. This particular
behavior has not been changed, other than puttting in a mechanism to
tinker the minimum distance, since before October of last year.
Now, I don't know what to do if you really want to select the PPS all
the time, even if the root distancedecides otherwise. Judging from my
own experience here with high-end radios and a different driver, it
could be your residuals could be different than mine.
If you have multiple radios, and some of our servers do, and they all
look the same to the NTP grooming algorithms, then the only way to
establish a hierarchy is to fiddle the root distance set by the driver
itself. You could add a few milliseconds in the GPS driver, for intance,
and the PPS could trounce that easily. In view of the comparable quality
apparently seen by the grooming algorithms, I don't think this makes
much sense. A random selectino would be just as good as any other.
Dave
Mark Martinec wrote:
>I would like to reopen a topic we had a month ago. I finally managed to
>repeat the exercise with the most current snapshot, and the issue is
>very much still present and troublesome. If the changed ntpd code
>as it stands now gets released, I sure there will be many unhappy and
>surprised souls out there, and lots of explaining to do on the Usenet.
>
>Let me first refresh our memory:
>
>>From John Pettitt:
>
>>I just updated gatekeeper one of my FreeBSD servers from
>>ntp-dev-4.2.0a-20050206 to ntp-dev-4.2.0a-20050303 and it's started
>>selecting clocks other than the attached GPS. ...
>>
>
>>From Bjorn Gabrielsson:
>
>>I have seen something like your case. Internet time servers was
>>prefered over a local Trimble Accutime (Palisade). It looked
>>_something_ like this.
>>
>
>>From Mark Martinec:
>
>>I'm quite certain I've seen such behaviour with our GPS clock too
>>with recent versions (but I wasn't careful enough to be able to point
>>to a particular version). I had to add the option 'true' along with 'prefer'
>>to the GPS refclock 'server' line to make it stable again.
>>My guess is the changed behaviour coincides with the introduction
>>of 'tos mindist' or option 'true'. It certainly never happened before.
>>
>
>>From Greg Dowd:
>
>>I'll echo that. With only 2 lines in the config file, ntpd now selects s2
>>server over my ACTS refclock even with the prefer keyword set on the ACTS
>>driver. I know 2 entries are bad but I'm still surprised.
>>
>
>>From David L. Mills:
>
>>If the correctness intervals of two severs don't overlap, a majority
>>clique does not exist and both are falseticker. If for any reason the
>>prefer peer is considered a falseticker, the clustering algorithm
>>doesn't get to see it. However, if there are only two servers, the
>>clusting algorithm will not throw either of them away. It may however
>>determine which one wins. The list is sorted first by stratum, then by
>>synchronization distance.
>>
>
>Now that I was able to do some experiments, I can claim again that
>the new ntpd code chooses network refclocks in preference to local
>GPS clock several times a day - but eventually switches back to GPS
>at some point. Older code NEVER behaved this way. The only solution
>I have is to append option 'true' to the existing 'prefer', which
>fixes the problem.
>
>About our GPS source (using Trimble event stamping, once per second),
>I claim that it is ALWAYS withing few microseconds (0, 1, 2 or 3 us)
>from the true time, and its jitter is around 0 or 1 us. Even if I add
>the PPS source via the atom refclock (which shows almost identical
>characteristics to the timestamping source), this does not make things
>neither better nor worse.
>
>So in the presence of several (backup) network refclocks, the one or
>two perfect local sources with jitter well below other network refs
>are declared falsetickers and junked for periods of few dozens of minutes
>to hours.
>
>I also experimented with the 'tos mindist'. Its default value is supposed
>to be 1 ms. The behaviour is the same if tos mindist is not specified
>at all, or if I explicitly set it to 10 ms (tos mindist 0.010).
>The other network refs offsets are below 1 ms from our GPS
>most of the time, with their jitter also usually below a ms.
>
>My impression is that given a dozen of network references in addition
>to the local GPS clock, there is a high chance that some clique of
>few network sources forms, and ntpd starts believeing such a clique
>in preference to its trusty local GPS source.
>
>So what change in the code is causing it?
>
>Here is one example of ntpd -p output, captured in such a situation:
>
> remote refid st t when poll reach delay offset jitter
>==============================================================================
>-GPS_PALISADE(1) .GPS. 0 l 14 16 377 0.000 0.025 0.001
>-prestreljenik.a 131.188.3.221 2 u 41 64 377 1.279 0.857 1.008
>+planja.arnes.si 131.188.3.220 2 u 51 64 377 1.571 0.223 0.294
> rpttlj18.arnes. 193.2.4.2 2 u 51 64 377 1.487 0.217 0.172
>-biofiz.mf.uni-l 193.67.79.202 2 u 41 64 377 3.129 -1.016 0.381
>-vega.cbk.poznan .PPS. 1 u 32 64 377 74.304 -0.646 1.443
>-ntp2.ptb.de .PTB. 1 u 51 64 377 31.009 -0.562 0.225
>*sombrero.cs.tu- .PPS. 1 u 39 64 377 32.685 0.098 0.342
>+swisstime.ee.et .PPS. 1 u 43 64 377 31.293 0.141 45.131
>+ntp1.NL.net .GPS. 1 u 47 64 377 53.029 0.162 3.276
>-Time1.Stupi.SE .PPS. 1 u 44 64 377 58.305 7.420 0.216
>-rustime01.rus.u .DCFp. 1 u 40 64 377 24.937 -0.801 0.254
>xtock.usno.navy. .USNO. 1 u 110 64 276 160.385 19.938 1.303
>-time-B.timefreq .ACTS. 1 u 33 64 377 156.785 1.358 0.250
>-minnehaha.rhrk. 131.188.3.222 2 u 36 64 377 30.496 0.826 1.089
>x2001:7b8:3:2d:: 61.5.175.255 3 u 30 64 377 39.004 -67.250 6.140
>
>
>Just a word of explaining, in case someone wonders why we have so many
>network peers configured: besides their backup role, ntp log is a very
>nice network analyis tool, both as a quick glance insight into network health,
>and for long term analysis, helping identify connectivity problems, local, at
>ISP, and with international links. I believe other posters experiencing the
>problem didn't have that many sources, but still got into trouble.
>
> Mark
>
More information about the hackers
mailing list