[ntp:hackers] Re: GPS weirdness with ntp-dev-4.2.0a-200503xx

David L. Mills mills at udel.edu
Fri Apr 8 20:40:10 PDT 2005


Mark,

Check out pogo.udel.edu, a Solaris connected to a Spectracom GPS and 
PPS. I checked several days in the archive from October 2004 to now and 
found no clockhops other than a few occasions when I was running special 
tests. On some occasions I purposely turned down mindist below 10 ms and 
did confirm little wiggles with amplitude of a few tens of microseconds 
could cause clockhop of one or the other GPS or PPS.

There are two ways one or the other GPS or PPS can win or lose. If the 
correctness intervals for the two sources do not overlap, no majority 
clique can result and both sources will show a tally code of space. The 
other way is one or the other root distance falls over the maximum 
distance (maxdist) threshold, normally one second. If this happens one 
of GPS or PPS will be selected and the other one show space. The normal 
case will be for GPS to show + and the PPS to show o.

Notice in the clock selection algorithm the mitigation code near the 
end. For that to work properly, both the GPS and PPS must survive the 
intersection and clustering algorithms. Since the default minimum 
cluster survivors is three, your two are guaranteed to get through. The 
only other possible leak are the intersection algorithm and fitness 
check (peer_unfit). I suggest checking the dispersion on the GPS driver 
to make sure that never gets close to one second; it should only be 10 
ms or so.

There is another possibility where both GPS and PPS survive, but the 
clustering algorithm has ordeded GPS ahead of PPS. The selection metric 
is dominated by stratum and secondarily by root distance. But, the 
stratum of both the GPS and PPS is the same - zero, so the only other 
decision data is the root distance. There could well be a clockhop if 
the root distance, which includes dispersion and jiter, wobbles back and 
fourth, as it naturally does. The result is to the clustering algorithm 
the two sources look very close to each other and sometimes cannot be 
distinguished.

What the algorithms are telling you in such cases is that both sources 
appear - to the algorithm - of comparable quality and the algorithm 
can't tell them apart. In other words - to the algorithm - it doesn't 
matter which source is used; they are equivalent. This particular 
behavior has not been changed, other than puttting in a mechanism to 
tinker the minimum distance, since before October of last year.

Now, I don't know what to do if you really want to select the PPS all 
the time, even if the root distancedecides otherwise. Judging from my 
own experience here with high-end radios and a different driver, it 
could be your residuals could be different than mine.

If you have multiple radios, and some of our servers do, and they all 
look the same to the NTP grooming algorithms, then the only way to 
establish a hierarchy is to fiddle the root distance set by the driver 
itself. You could add a few milliseconds in the GPS driver, for intance, 
and the PPS could trounce that easily. In view of the comparable quality 
apparently seen by the grooming algorithms, I don't think this makes 
much sense. A random selectino would be just as good as any other.

Dave



Mark Martinec wrote:

>I would like to reopen a topic we had a month ago. I finally managed to
>repeat the exercise with the most current snapshot, and the issue is
>very much still present and troublesome. If the changed ntpd code
>as it stands now gets released, I sure there will be many unhappy and
>surprised souls out there, and lots of explaining to do on the Usenet.
>
>Let me first refresh our memory:
> 
>>From John Pettitt:
>
>>I just updated gatekeeper one of my FreeBSD servers from
>>ntp-dev-4.2.0a-20050206 to ntp-dev-4.2.0a-20050303 and it's started
>>selecting clocks other than the attached GPS. ...
>>
>
>>From Bjorn Gabrielsson:
>
>>I have seen something like your case. Internet time servers was
>>prefered over a local Trimble Accutime (Palisade). It looked
>>_something_ like this. 
>>
>
>>From Mark Martinec:
>
>>I'm quite certain I've seen such behaviour with our GPS clock too
>>with recent versions (but I wasn't careful enough to be able to point
>>to a particular version). I had to add the option 'true' along with 'prefer'
>>to the GPS refclock 'server' line to make it stable again.
>>My guess is the changed behaviour coincides with the introduction
>>of 'tos mindist' or option 'true'. It certainly never happened before.
>>
>
>>From Greg Dowd:
>
>>I'll echo that.  With only 2 lines in the config file, ntpd now selects s2
>>server over my ACTS refclock even with the prefer keyword set on the ACTS
>>driver.  I know 2 entries are bad but I'm still surprised.    
>>
>
>>From David L. Mills:
>
>>If the correctness intervals of two severs don't overlap, a majority 
>>clique does not exist and both are falseticker. If for any reason the 
>>prefer peer is considered a falseticker, the clustering algorithm 
>>doesn't get to see it. However, if there are only two servers, the 
>>clusting algorithm will not throw either of them away. It may however 
>>determine which one wins. The list is sorted first by stratum, then by 
>>synchronization distance.
>>
>
>Now that I was able to do some experiments, I can claim again that
>the new ntpd code chooses network refclocks in preference to local
>GPS clock several times a day - but eventually switches back to GPS
>at some point. Older code NEVER behaved this way. The only solution
>I have is to append option 'true' to the existing 'prefer', which
>fixes the problem.
>
>About our GPS source (using Trimble event stamping, once per second),
>I claim that it is ALWAYS withing few microseconds (0, 1, 2 or 3 us)
>from the true time, and its jitter is around 0 or 1 us. Even if I add
>the PPS source via the atom refclock (which shows almost identical
>characteristics to the timestamping source), this does not make things
>neither better nor worse.
>
>So in the presence of several (backup) network refclocks, the one or
>two perfect local sources with jitter well below other network refs
>are declared falsetickers and junked for periods of few dozens of minutes
>to hours.
>
>I also experimented with the 'tos mindist'. Its default value is supposed
>to be 1 ms. The behaviour is the same if tos mindist is not specified
>at all, or if I explicitly set it to 10 ms (tos mindist 0.010).
>The other network refs offsets are below 1 ms from our GPS
>most of the time, with their jitter also usually below a ms.
>
>My impression is that given a dozen of network references in addition
>to the local GPS clock, there is a high chance that some clique of
>few network sources forms, and ntpd starts believeing such a clique
>in preference to its trusty local GPS source.
>
>So what change in the code is causing it?
>
>Here is one example of ntpd -p output, captured in such a situation:
>
>     remote           refid      st t when poll reach   delay   offset  jitter
>==============================================================================
>-GPS_PALISADE(1) .GPS.            0 l   14   16  377    0.000    0.025   0.001
>-prestreljenik.a 131.188.3.221    2 u   41   64  377    1.279    0.857   1.008
>+planja.arnes.si 131.188.3.220    2 u   51   64  377    1.571    0.223   0.294
> rpttlj18.arnes. 193.2.4.2        2 u   51   64  377    1.487    0.217   0.172
>-biofiz.mf.uni-l 193.67.79.202    2 u   41   64  377    3.129   -1.016   0.381
>-vega.cbk.poznan .PPS.            1 u   32   64  377   74.304   -0.646   1.443
>-ntp2.ptb.de     .PTB.            1 u   51   64  377   31.009   -0.562   0.225
>*sombrero.cs.tu- .PPS.            1 u   39   64  377   32.685    0.098   0.342
>+swisstime.ee.et .PPS.            1 u   43   64  377   31.293    0.141  45.131
>+ntp1.NL.net     .GPS.            1 u   47   64  377   53.029    0.162   3.276
>-Time1.Stupi.SE  .PPS.            1 u   44   64  377   58.305    7.420   0.216
>-rustime01.rus.u .DCFp.           1 u   40   64  377   24.937   -0.801   0.254
>xtock.usno.navy. .USNO.           1 u  110   64  276  160.385   19.938   1.303
>-time-B.timefreq .ACTS.           1 u   33   64  377  156.785    1.358   0.250
>-minnehaha.rhrk. 131.188.3.222    2 u   36   64  377   30.496    0.826   1.089
>x2001:7b8:3:2d:: 61.5.175.255     3 u   30   64  377   39.004  -67.250   6.140
>
>
>Just a word of explaining, in case someone wonders why we have so many
>network peers configured: besides their backup role, ntp log is a very
>nice network analyis tool, both as a quick glance insight into network health, 
>and for long term analysis, helping identify connectivity problems, local, at 
>ISP, and with international links. I believe other posters experiencing the
>problem didn't have that many sources, but still got into trouble.
>
>  Mark
>




More information about the hackers mailing list