[ntp:hackers] clustered polling times

David Mills mills at udel.edu
Mon Feb 2 14:36:34 UTC 2009


Danny,

The fact that poll times tend to be clustered is by deliberate design. 
If the poll times were aggresively torqued, symmetric mode associations 
would find frequent cases where polls are dropped or duplicated. The 
only reason polls are randomized in the first place is to avoid 
imploding multiple polls on the same second. In point of fact, the poll 
schedule is designed as a random walk, so the bunching tends to even out 
over time.

Dave

Danny Mayer wrote:

>Dave Hart wrote:
>  
>
>>ntpd has a tendency to have peer poll times clustered together, which is
>>particularly noticable with maxpoll 12, as seen in this extract of the
>>central columns of ntpq -c peers output:
>>
>> st t when poll
>>================
>>  2 u 1353 2048
>>  2 u 1375 2048
>>  2 u  964 2048
>>  2 u 1312 2048
>>  2 u 1278 2048
>>  2 u 1259 2048
>>  2 u 1317 2048
>>  2 u  957 2048
>>  2 u 1331 2048
>>
>>One cluster around 960 seconds ago, the other around 1300.  Intuitively, I
>>have a strong feeling this is suboptimal.  In fact, ntpd does have some code
>>to randomize the polling interval already:
>>
>>a)  the RANDPOLL macro fuzzes the interval time used by -1 to +2 seconds.
>>b)  ntp_proto.c peer_clear(), when not initializing and the peer is not
>>passive mode, as the comment says, "Othersie [sic], randomize over the
>>minimum poll interval in order to avoid broadcast implosion." using the code
>>below:
>>
>>peer->nextdate += (ntp_random() & ((1 << NTP_MINDPOLL) - 1));
>>
>>In the case of a well-behaved ntpd with well-behaved peers, peer_clear's
>>randomizing doesn't come into play, as seen in the ntpq -c peers snippet
>>above.  The polling times drift apart very gradually under RANDPOLL's
>>influence.
>>
>>The approach I believe I am trying to spread out the sampling more evenly is
>>to randomize the length of the first interval when increasing the polling
>>interval for a peer between the old interval length and the new.  So when
>>going from poll 6 to poll 7, for example, instead of using 128s for the
>>first interval period, use between 64s and 128s at ntp_random()'s whim.
>>
>>After a while running this code (on a different host) I see:
>>
>> st t when poll r
>>=================
>>  2 u   75  512
>>  2 u   80  512
>>  2 u  137  512
>>  2 u   74  512
>>  2 u  192  512
>>  2 u  124  512
>>  2 u   66  512
>>  2 u   70  512
>>  2 u  119  512
>>
>>I'm curious to see how it looks once I let it run with maxpoll 12 long
>>enough.  Patch attached.  Comments welcome.  In particular, is the shorter
>>first interval after increasing peer->hpoll likely to cause any problem for
>>ntpd?
>>
>>Cheers,
>>Dave Hart
>>    
>>
>
>Attachments are not allowed in the mailing list or newsgroup. What makes
>you believe that there is something wrong with the polling time? On
>startup each server is polled for a time and the results are
>incorporated in the ntp algorithm. after that the polling proceeds at
>regular intervals with gradual backoff as the algorithm accumulates
>enough data and becomes more stable. You shouldn't be changing the
>polling interval at all unless you understand the consequences of doing
>so. This is an engineering problem and not an exercise in computer
>science. Bill Unruh (who is a physicist) will tell you that ntp throws
>away too many samples and Dave Mills will tell you its sampling about
>twice the rate that it should. I tend to believe Dave as he has 30 years
>of experiments to back up his statements. You need to spend time
>understanding about the Allan intercept, sampling, algorithm stability,
>an lots of statistics. You can go to Dave's research papers or go to his
>book to get an understanding of the problem and solutions that Dave has
>evolved over the years.
>
>BTW there are strict limits on what you can change in ntp_proto.c. See
>the warning on top of that file.
>
>Danny
>  
>



More information about the hackers mailing list