[ntp:hackers] Weird initial frequency calculations.

Brian Utterback brian.utterback at sun.com
Fri Aug 1 19:16:43 UTC 2008


I explained this before. The frequency numbers I posted in the first 
message in this thread did not represent a timeline. It represented 
the values that ntpd set the frequency to in the calc_freq routine in 
ntp_loopfilter.c So, the frequency was not "moving toward a correct 
negative value and suddenly show a single positive value". This was a 
table of frequency settings that ntpd made each one between 3 minutes 
and 26 minutes after ntpd was restarted. I sorted the table by time so 
that the correlation between the time and accuracy was more evident. 
This was apparently a mistake, since it has led to quite a lot of 
confusion. Each line of the table is an event that happened one or 
more hours before or after all of the other lines.

Here is the same table, without sorting and leaving the timestamps intact:

Jul 29 22:18:42 freq=-22073
Jul 29 23:03:23 freq=615
Jul 30 00:19:46 freq=-25569
Jul 30 01:05:36 freq=1578
Jul 30 02:18:48 freq=-24193
Jul 30 03:03:28 freq=-751
Jul 30 04:05:44 freq=-7642
Jul 30 05:03:32 freq=-4135
Jul 30 06:03:31 freq=-8245
Jul 30 07:21:02 freq=-26722
Jul 30 08:19:52 freq=-23362
Jul 30 09:20:52 freq=-22816
Jul 30 10:23:17 freq=-27761
Jul 30 11:18:45 freq=-26337
Jul 30 12:05:45 freq=1642
Jul 30 13:03:37 freq=-3478
Jul 30 14:03:40 freq=-6324
Jul 30 15:26:34 freq=-27200
Jul 30 16:03:43 freq=2062
Jul 30 17:03:42 freq=-3551
Jul 30 18:03:43 freq=-4969
Jul 30 19:03:41 freq=-5407
Jul 30 20:03:48 freq=-3894
Jul 30 21:03:45 freq=-7290
Jul 30 22:03:47 freq=-4073
Jul 30 23:04:49 freq=-6159
Jul 31 00:07:06 freq=-4963
Jul 31 01:04:53 freq=-2309
Jul 31 02:03:49 freq=-6067
Jul 31 03:03:52 freq=-3325
Jul 31 04:19:11 freq=-22233
Jul 31 05:03:53 freq=-1238
Jul 31 06:20:15 freq=-26172
Jul 31 07:21:23 freq=-19315
Jul 31 08:03:59 freq=-838
Jul 31 09:03:56 freq=-1556
Jul 31 10:03:56 freq=-2672
Jul 31 11:04:00 freq=-2594


In each case, ntpd was restarted on the hour, so the minutes and 
seconds are the actual time as well as the time since ntpd was restarted.

I hope that clears up the confusion.

David L. Mills wrote:
> Brian,
> 
> In the thread below I cite three samples from you data where the 
> reported frequency was moving toward a correct negative value and 
> suddenl showed a single positive value. There was no external signal 
> from ntpd to cause that and no apparent other cause why it would do 
> that. You have reported anomalistic frequency behavior before, so far as 
> I know yet unexplained. One experiment you did was completely at odds 
> with the expected behavior of the microkernel that cannot be duplicated 
> with the machines available here. You have mentioned spread spectrum 
> clocking, so far as I know yet unresolved. Your data show anomalistic 
> behavior that cannot be duplicated here.
> 
> The anomalistic behavior evident in your data needs to be explained 
> before looking into possibly related other effects. As I have said 
> before, the most likely explanation is overflow somwhere in the 
> frequency calculation. So far as I know, this does not happen in other 
> kernels or in the Solaris kernel with the machines available here.
> 
> It would help and save a lot of sweat here if Solaris would upgrade to 
> the nanokernel so at least all machines would be on common ground. By 
> the way, all Solaris machines here are running with 64-bit kernele.
> 
> Dave
> 
> Brian Utterback wrote:
> 
>> Okay, but as I said before, I do not understand what you mean by 
>> anomalistic frequency bumps.
>>
>> David L. Mills wrote:
>>
>>> Brian,
>>>
>>> Stop. I'm not going to consider anything further until an explanation 
>>> of the anomalistic frequency bumps I cited is offered. These did not 
>>> involve the daemon and apparently happened all by themselves. Until 
>>> the cause of these glitches has been determined, I'm not going to 
>>> worry about other behaviors that might well be dependent on the 
>>> cause. Again, the behavior you report is not apparent on two Solaris 
>>> 10 machines here, one an Ultra 60 and another a Blade 1500.
>>>
>>> As I said previously, it is necessary to isolate the ntp_adjtime() 
>>> and adjtime() behavior to see if the serious frequency wobble is due 
>>> the kernel discipline.
>>>
>>> Dave
>>>
>>> Brian Utterback wrote:
>>>
>>>> The frequency is not jumping around on its own. As I said, the 
>>>> frequency is set to zero via ntptime before ntpd is started. Since 
>>>> there is no drift file, ntpd also sets the frequency to 0 at start 
>>>> up.   Then, 195 seconds into the run, ntp_adjtime is called with an 
>>>> offset to correct. This call results in the frequency being 
>>>> recalculated in the kernel.
>>>>
>>>> What I see is basically 4 sets of calls to ntp_adjtime at varying 
>>>> points of time after startup. The timeline goes like this:
>>>>
>>>> At time 0, ntp_adjtime is called twice, once to set the offset=0 and
>>>> once to set the frequency=0.
>>>>
>>>> About 195 seconds later, ntp_adjtime is called to set an offset. 
>>>> This results in a recalculation of the frequency by the kernel.
>>>>
>>>> Sometime later, varying from 2 seconds to 1000 seconds, ntp_adjtime 
>>>> is called to set the frequency to some value. Immediately after 
>>>> that, the offset is set to 0.
>>>>
>>>> Sometime later, varying from 2 seconds more to 1000 seconds, 
>>>> ntp_adjtime is called with a new offset, which again recalculates 
>>>> the frequency.
>>>>
>>>> I have no problem with the sequence of events here, it is what would 
>>>> be expected. The problem is that when the last two intervals are 
>>>> very small, the frequencies set and then recalculated are wildly 
>>>> inaccurate. When those two intervals are long, they are much closer 
>>>> to the correct values.
>>>>
>>>> So, the two questions I have are these:
>>>>
>>>> 1. Does ntpd take into account the frequency set in the kernel after 
>>>> the first real offset is set when it makes its subsequent frequency 
>>>> calculation?
>>>>
>>>> 2. Why do the last two intervals vary so much and what can we do 
>>>> about it?
>>>>    
>>>> David L. Mills wrote:
>>>>
>>>>> Brian,
>>>>>
>>>>> The fact that the frequency jumps around when the daemon is not 
>>>>> disciplining the kernel is a red flag. If the kernel is disabled 
>>>>> and the daemon loop with adjtime() works okay, there is big trouble 
>>>>> in the microkernel code. If the serious wobble continues, there is 
>>>>> something evil in the hardware.
>>>>>
>>>>> What is so alarming is that the kernel frequency is compouted as an 
>>>>> integral of the offsets, which should be slowly decreasing to zero. 
>>>>> This is purely a mathematical operation and should not be affected 
>>>>> by hardware glitch. Unless, that is, there is a discontinuity due 
>>>>> to integer overflow or something like that. I've never seen this 
>>>>> happen on any Sun or Alpha since 1992. Remember, we still haven't 
>>>>> resolved the glitch you reported some time back, and that did 
>>>>> appear to be instablity in the feedback loop due to spread spectrum 
>>>>> clocking.
>>>>>
>>>>> Dave
>>>>>
>>>>> Brian Utterback wrote:
>>>>>
>>>>>> Sorry, you are right, it is ppb. I have been looking at them so 
>>>>>> much I was just automatically doing the conversion in my head.
>>>>>>
>>>>>> I need a little more instruction. What do you want me to do? I got 
>>>>>> the part about setting disable ntp. Then what?
>>>>>>
>>>>>> David L. Mills wrote:
>>>>>>
>>>>>>> Brian,
>>>>>>>
>>>>>>> Those frequency numbers are 1000 times your previously measured 
>>>>>>> value. If you are using the kernel frequency varuable directly, 
>>>>>>> its units are PPB.
>>>>>>>
>>>>>>> If you removed the frequency file, the bumps I cited could not be 
>>>>>>> due to anything but the kernel discipline, since the daemon does 
>>>>>>> not again discipline the clock until 15 minutes into the run. The 
>>>>>>> kernel isn't doing anything except an exponential decay of the 
>>>>>>> initial uupdate and its integral for the frequency, so the 
>>>>>>> frequency should be very even with not even small bumps.
>>>>>>>
>>>>>>> This makes it even more important to verify the kernel discipline 
>>>>>>> is working correctly. You should fire up the machine with disable 
>>>>>>> ntp command in the configuration file and see if the adjtime() 
>>>>>>> syscall results in the same behavior you show.
>>>>>>>
>>>>>>> Dave
>>>>>>>
>>>>>>> Brian Utterback wrote:
>>>>>>>
>>>>>>>  
>>>>>>>
>>>>>>>> The units of the frequency shown are ppm. I forgot to mention 
>>>>>>>> that the ntp.drift file is removed after stopping the previous 
>>>>>>>> ntpd and before starting the next. In each case, the ntpd 
>>>>>>>> program was allowed to run for one hour before starting the next 
>>>>>>>> cycle, which was started immediately, so the offset present at 
>>>>>>>> the start was whatever it was when ntpd exited. Looking at the 
>>>>>>>> offsets at the beginning of each run and the end of each 
>>>>>>>> previous run, the offset was generally in the 3-10ms range, but 
>>>>>>>> was as high as 45ms.
>>>>>>>>
>>>>>>>> I am not sure what you are saying about the spikes, unless you 
>>>>>>>> are misunderstanding the data. This table represents hourly runs 
>>>>>>>> done over a period of almost 2 days. The output was then sorted 
>>>>>>>> to provide a visual correlation between the length of time spent 
>>>>>>>> before setting the frequency and the accuracy thereof. The table 
>>>>>>>> does not represent a timeline at all. If you did not 
>>>>>>>> misunderstand and still think that there is an issue with 
>>>>>>>> spikes, then please elaborate, because I don't understand what 
>>>>>>>> you are saying.
>>>>>>>>
>>>>>>>> David L. Mills wrote:
>>>>>>>>
>>>>>>>>  
>>>>>>>>
>>>>>>>>> Brian,
>>>>>>>>>
>>>>>>>>> You don't say what the units are nor what the offset presented 
>>>>>>>>> was. You apparently used a special program. You don't need to 
>>>>>>>>> do that, as the residual offset and frequecy are copied to the 
>>>>>>>>> daemon for loopstats. I assume you set the ntp.drift value to 
>>>>>>>>> zero, in which case the kernel will be initialized to zero as 
>>>>>>>>> well. I'm running Solaris 10 on several machines, but not the 
>>>>>>>>> one of you cite.
>>>>>>>>>
>>>>>>>>> See minutes 3:43, 536 and 5:45. The loop can't do that unless 
>>>>>>>>> hit with a large spike. If the units are in PPB, the 
>>>>>>>>> convergence makes sense, but way too fast. It should take many 
>>>>>>>>> hours to get there.
>>>>>>>>>
>>>>>>>>> I suggest you do the experiment again, but disable the kernel. 
>>>>>>>>> Watch the loopstats; the kernel should do the same thing. Just 
>>>>>>>>> for whims, verify the kernel time constant is 2 and have 
>>>>>>>>> maxpoll 6.
>>>>>>>>>
>>>>>>>>> Dave
>>>>>>>>>
>>>>>>>>> Brian Utterback wrote:
>>>>>>>>>
>>>>>>>>>    
>>>>>>>>>
>>>>>>>>>> While further investigating the initial frequency 
>>>>>>>>>> calculations, I got some weird results.
>>>>>>>>>>
>>>>>>>>>> What I did was take a system that had a known drift of 
>>>>>>>>>> approximately -28 ppm. I then made a script that ran the p120 
>>>>>>>>>> ntpd, after first setting the system frequency and offset to 0 
>>>>>>>>>> using ntptime. I then traced the calls to ntp_adjtime, looking 
>>>>>>>>>> for the call that set the kernel frequency after the initial 
>>>>>>>>>> frequency calculation. I used the same four servers each time, 
>>>>>>>>>> all reporting time to within less than a millisecond or so of 
>>>>>>>>>> each other. I did not configure iburst.
>>>>>>>>>>
>>>>>>>>>> What I found was that ntpd set the frequency in 3 to 7 
>>>>>>>>>> minutes, or 18 to 26 minutes. The frequency set in the 3 to 7 
>>>>>>>>>> minute cases were terrible approximations while those in the 
>>>>>>>>>> 18 to 26 mintue range were pretty good, or at least in the 
>>>>>>>>>> ballpark.
>>>>>>>>>>
>>>>>>>>>> So, I don't why there is a difference, but something weird is 
>>>>>>>>>> going on in how ntpd is doing this initial frequency calculation.
>>>>>>>>>>
>>>>>>>>>> Here are the data I have collected. The first column is the 
>>>>>>>>>> time since ntpd started up, and the second column is the 
>>>>>>>>>> frequency the kernel was being set to:
>>>>>>>>>>
>>>>>>>>>> 03:23  freq=615
>>>>>>>>>> 03:28  freq=-751
>>>>>>>>>> 03:31  freq=-8245
>>>>>>>>>> 03:32  freq=-4135
>>>>>>>>>> 03:37  freq=-3478
>>>>>>>>>> 03:40  freq=-6324
>>>>>>>>>> 03:41  freq=-5407
>>>>>>>>>> 03:42  freq=-3551
>>>>>>>>>> 03:43  freq=-4969
>>>>>>>>>> 03:43  freq=2062
>>>>>>>>>> 03:45  freq=-7290
>>>>>>>>>> 03:47  freq=-4073
>>>>>>>>>> 03:48  freq=-3894
>>>>>>>>>> 03:49  freq=-6067
>>>>>>>>>> 03:52  freq=-3325
>>>>>>>>>> 03:53  freq=-1238
>>>>>>>>>> 03:56  freq=-1556
>>>>>>>>>> 03:56  freq=-2672
>>>>>>>>>> 03:59  freq=-838
>>>>>>>>>> 04:00  freq=-2594
>>>>>>>>>> 04:49  freq=-6159
>>>>>>>>>> 04:53  freq=-2309
>>>>>>>>>> 05:36  freq=1578
>>>>>>>>>> 05:44  freq=-7642
>>>>>>>>>> 05:45  freq=1642
>>>>>>>>>> 07:06  freq=-4963
>>>>>>>>>> 18:42  freq=-22073
>>>>>>>>>> 18:45  freq=-26337
>>>>>>>>>> 18:48  freq=-24193
>>>>>>>>>> 19:11  freq=-22233
>>>>>>>>>> 19:46  freq=-25569
>>>>>>>>>> 19:52  freq=-23362
>>>>>>>>>> 20:15  freq=-26172
>>>>>>>>>> 20:52  freq=-22816
>>>>>>>>>> 21:02  freq=-26722
>>>>>>>>>> 21:23  freq=-19315
>>>>>>>>>> 23:17  freq=-27761
>>>>>>>>>> 26:34  freq=-27200
>>>>>>>>>>
>>>>>>>>>>         
>>>>>>>>> _______________________________________________
>>>>>>>>> hackers mailing list
>>>>>>>>> hackers at lists.ntp.org
>>>>>>>>> https://lists.ntp.org/mailman/listinfo/hackers
>>>>>>>>>       
>>>>>>>>     
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> hackers mailing list
>>>>>>> hackers at lists.ntp.org
>>>>>>> https://lists.ntp.org/mailman/listinfo/hackers
>>>>>>>   
>>>>>>
>>>>> _______________________________________________
>>>>> hackers mailing list
>>>>> hackers at lists.ntp.org
>>>>> https://lists.ntp.org/mailman/listinfo/hackers
>>>>
>>> _______________________________________________
>>> hackers mailing list
>>> hackers at lists.ntp.org
>>> https://lists.ntp.org/mailman/listinfo/hackers
>>
> 
> _______________________________________________
> hackers mailing list
> hackers at lists.ntp.org
> https://lists.ntp.org/mailman/listinfo/hackers

-- 
blu

There are two rules in life:
Rule 1- Don't tell people everything you know
----------------------------------------------------------------------
Brian Utterback - Solaris RPE, Sun Microsystems, Inc.
Ph:877-259-7345, Em:brian.utterback-at-ess-you-enn-dot-kom


More information about the hackers mailing list