[ntp:hackers] ntp p110 and setting frequency and offset still off.

David L. Mills mills at udel.edu
Wed Feb 6 21:08:05 UTC 2008


Brian,

I found at least one bogeyman. I also found a simpler way to measure the 
open-loop rsponse.

Starting from a runnning client system close to server time and 
frequency, stop the daemon and edit the configuration file to put

disable ntp
server <old server> maxpoll 6 noselect

Start the daemon and be sure the loopstats is enabled. Then, run

 ntptime -t 2 -s 1

followed by

ntptime -o 100000

That puts in the inital step. You can track time offset behavior in the 
loopstats or ntptime program. On my Blade the frequency change is 111 
PPM (!!) and the timeoffset starts going negative to the general depth 
of the Mariannas Trench. On a FreeBSD nanokernel the inital frquency 
change is about 0.5 PPM.

Claarly, something is broken in the hardupdate() routine where the 
frequency is updated. look for the ltemp *= mtemp; line and work from 
there. That SHIFT_KF - SHIFT_USEC looks mighty suspicious when comparing 
Solaris timex.h and that code. Probably the best thing is to rip out the 
multiplies and put the shifts back in.

The misbehavior between the Ultra and Blade could be an overflow 
problem. I think everything here is 64-bit, but am not sure.

Dave

Brian Utterback wrote:

> First, the kernels are the same on a U5 and B1500.
>
> I am a little confused about the test. My understanding is that
> the resulting change in the frequency is also dependent on the
> delta reftime between the offset being set. So the results I
> get on running ntptime twice in the manner you describe will be
> dependent on how quickly I run them, won't they?
>
> David L. Mills wrote:
>
>> Brian,
>>
>> Well at least what Hertz is not the time frequency. In the timer 
>> interrput a small correction time_adj is added to the system clock 
>> time in addition to the tick itself. That increment includes separate 
>> contributions of the phase and frequency computed in the seconds 
>> overflow code, which I didn't see in your link list. The increment 
>> includes a correction due to the fact the Hz is not a multiple of 2. 
>> The frequency part all works. Now, a little checking with the seconds 
>> overflow code and peeking at the microkernel code might raise 
>> confidence that the phase code is correct as well. Gotta check those 
>> dingbat shits, multiplies and signs.
>>
>> The last remaining usual suspect is the hardupdate() routing. 
>> Assuming this far, you have checked the Solaris 10  kernel is in fact 
>> identical in the Ultra 5+10 and Blade 1500, because we still don't 
>> know why they behave so differently. Hopefully you can verify that.
>>
>> The next step might be a little ugly. What you want to know is what 
>> increment in phase and frequency is produced by a time step offset. 
>> This can be done using only the ntptime routine and watching the 
>> kernel variables time_offset and time_freq. Start with ntptime -o 0 
>> -f 0 and then ntptime -o 0.1 for example and immediately display the 
>> phase and frequency. Do this for both your new machine and an old 
>> clunker Ultra 5. They better be the same; if not, smoke the gun.
>>
>> Suspicion turns to a possible error when emulating the shifts with 
>> multipliesbut that doesn' explain why the Blade cuts sharper than the 
>> Ultra.
>>
>> Dave
>>
>> Brian Utterback wrote:
>>
>>> Okay, I did the test. When I run with the time_freq set to 0,
>>> I get -27.630 for the calculated frequency. When I set time_freq
>>> to 10, I get -37.752. And just for grins I tried setting it to
>>> 50, and I got a calculated frequency of -78.072.
>>>
>>> So, that all looks normal to me.
>>>
>>> What next?
>>>
>>> David L. Mills wrote:
>>>
>>>> Brian,
>>>>
>>>> Ready for this? The kernel runs much better on an Ultra 5_10. Same 
>>>> ntpd version. It crosses zero in about 1200 s, overshoots 14 
>>>> percent, then settles down to 5 percent in about an hour as the 
>>>> time constant increases to 8. This is definitely outside the 
>>>> intended envelope, but not the crazy behavior of the Blade 1500. If 
>>>> you hadn't reassured me several times that the timer frequency 
>>>> remains at 100 Hz, I would strongly suspect the timer frequency is 
>>>> something like twice that. This would match the symptoms exactly.
>>>>
>>>> The most sensitive characteristic of the loop is how well the clock 
>>>> frequency matches the loop frequency. With ntptime set the offset 
>>>> and frequency to zero. Configure a good server with the noselect 
>>>> option and maxpoll 6. Run for an hour, measure the difference 
>>>> between the starting and stopping offsets and compute the 
>>>> frequency. Then, set the frequency to something like 10 PPM and 
>>>> repeat. The frequency determined from the difference between the 
>>>> two stopping offsets should be 10 PPM.
>>>>
>>>> The PLL loop behavior is determined from the SCALE_KG and SCALE_KF 
>>>> defines in the timex.h file. Note that the square root of KF is 
>>>> four times KG, but this is correct only for a given timer frequency 
>>>> and time constant. The loop scales KG directly with time constant 
>>>> and KF as its square.
>>>>
>>>> In principle, the value of the KF defined in timex.h could be 
>>>> adjusted so that the measured value is close to the 10 PPM 
>>>> difference above. Next the value of the KG define could in 
>>>> principle be adjusted to control the overshoot at about 6 percent. 
>>>> Each try of course means recompiling the kernel, so I don't suppose 
>>>> that's high on your list.
>>>>
>>>> In any case, the frequency check is a good canary.
>>>>
>>>> Dave
>>>>
>>>> Brian Utterback wrote:
>>>>
>>>>> snip 
>>>>
>>>
>>
>> _______________________________________________
>> hackers mailing list
>> hackers at lists.ntp.org
>> https://lists.ntp.org/mailman/listinfo/hackers
>
>



More information about the hackers mailing list