[ntp:questions] Re: ntpq times out if NMEA refclock configured?
R Jenkins
not at pub.lished
Sun May 14 21:03:50 UTC 2006
"Richard B. Gilbert" <rgilbert88 at comcast.net> wrote in message
news:9sqdnZb36qXzoPrZRVn-sg at comcast.com...
>R Jenkins wrote:
>> "Richard B. Gilbert" <rgilbert88 at comcast.net> wrote in message
>> news:L7udnQ5X_9grBvvZRVn-vA at comcast.com...
>>
>>>R Jenkins wrote:
>>>
>>>
>>>>Hi,
>>>>
>>>>I'm trying to add a GPS refclock to my server.
>>>>After total failure with a basic Trimble TSIP output GPS plus the parse
>>>>clock, I'm now using a Garmin GPS25 and the NMEA refclock.
>>>>
> <big snip>
>>>
>>>After rereading a little more carefully, I notice that your frequency
>>>correct of -495.9 PPM is on the ragged edge of the 500 PPM limit. It is
>>>unusual for a clock to have a freqency error this large; most are below
>>>50 PPM in absolute value.
>>>
>>>Does your system have a kernel parameter called "HZ"? Is it set to a
>>>value greater than 100? I believe I have seen references to values of
>>>both 250 and 1000; neither value works well with NTPD. The system seems
>>>to lose clock interrupts when HZ is greater than 100. YMMV but if you
>>>are not using 100, give it a try.
>>>
>>
>> Hi,
>> thanks for the replies.
>>
>> The -495.9 ppm seems to be a symptom of the refclock problem. Without the
>> NMEA refclock it was -60 after a few minutes, long before it had settled
>> properly.
>> I think it does have a fast Hz setting (I've seen it somewhere but I
>> can't remember where or what it was set to..) However, it's a 3.2GHz
>> processor so I don't think it should struggle too much.
>>
>>
>> I have the PPS pulse set to 200mS.
>> The PC does not normally have a display, I use telnet (well, SSH) from my
>> desk.
>> Running minicom at 4800 Baud with NTPD stopped shows the GPS serial data
>> is present:
>> $GPRMC,073153,A,5319.0516,N,00106.9355,W,000.0,000.0,140506,004.0,W*76
>> $GPRMC,073154,A,5319.0516,N,00106.9355,W,000.0,000.0,140506,004.0,W*71
>> $GPRMC,073155,A,5319.0516,N,00106.9355,W,000.0,000.0,140506,004.0,W*70
>> ...
>> I'm not sure how to remotely monitor the DCD line.
>>
>>
>> Simply having the 'server 127.127.20.0 prefer' line in causes the ntpq
>> hang.
>>
>> I've just got around to checking the log immediately after starting ntpd:
>>
>> May 14 08:19:51 gate2 ntpd[28723]: ntpd 4.2.0a at 1.1190-r Sat May 13
>> 10:39:48 BST 2006 (1)
>> May 14 08:19:51 gate2 ntpd[28723]: precision = 1.000 usec
>> May 14 08:19:51 gate2 ntpd[28723]: Listening on interface wildcard,
>> 0.0.0.0#123
>> May 14 08:19:51 gate2 ntpd[28723]: Listening on interface wildcard,
>> ::#123
>> May 14 08:19:51 gate2 ntpd[28723]: Listening on interface lo,
>> 127.0.0.1#123
>> May 14 08:19:51 gate2 ntpd[28723]: Listening on interface eth0,
>> 192.168.0.43#123
>> <Other interfaces trimmed>
>> May 14 08:19:51 gate2 ntpd[28723]: kernel time sync status 0040
>> May 14 08:19:51 gate2 ntpd[28723]: refclock_nmea: time_pps_kcbind failed:
>> Invalid argument
>> May 14 08:19:52 gate2 ntpd[28723]: too many recvbufs allocated (40)
>>
>> It looks like there is some problem with the kernel PPS interface, but I
>> have no idea what...
>> I used this patch:
>> PPSkit-light-alpha-3328m-2.6.15.1.diff
>> on a clean download of kernel 2.6.16.9 - there were a couple of rejects,
>> but they seemed to be pretty obvious & went in easily by hand..
>>
>> I'm happy to try another (recent) 2.6 kernel if there is one with a known
>> working patch?
>>
>> Another test: Leaving the 'flag 3 1' out stops the refclock error line in
>> the log.
>> The 'too many recvbufs allocated (40)' line seems to be triggered by the
>> NMEA refclock regardless of any other settings; it does not appear when
>> the NMEA clock is commented out in ntp.conf
>>
>> Robert Jenkins.
>>
>>
>
>
> If the HZ setting is causing the problem, it has little to do with
> processor speed!! The problem seems to be that various device drivers
> mask or disable interrupts for a period covering two or more clock
> interrupts causing one or more to be lost with each occurrence.
>
> The messages about "too many recvbufs allocated (40)" were associated with
> a bug in ntpd that I believe was fixed more than a year ago. You might
> want to try the latest version of ntpd. You can download it from
> http://ntp.isc.org/bin/view/Main/SoftwareDownloads
Hi,
I can understand the Hz setting messing up the accuracy, but I don't see it
would stop things running altogether?
It was at 250Hz, I'm presently compiling a kernel with it at 100Hz to see
what effect this has.
I thought I had tried the latest dev release (as per my last post), but it
turns out I had two copies of ntpd in different locations.
Using the ./configure options from the Redhat source does not put ntpd into
the /usr/sbin directory as with their build, they must be patching the paths
somewhere as well.
Having properly cleared the old files & rebuilt again, I am getting slightly
better results, but it's still not locking to the GPS.
After around an hour:
# ntpq -c peers
remote refid st t when poll reach delay offset
jitter
==============================================================================
xGPS_NMEA(0) .GPS. 0 l 18 64 377 0.000 -482.37
160.720
+gate.jrw.intra 130.159.196.118 3 u 6 16 377 0.196 131.924
19.370
*mail.alsys.ro .GPS. 1 u 49 64 377 67.099 25.813
69.507
+cronos.cenam.mx .GPS. 1 u 24 64 377 217.026 -10.000
92.241
ntptime
ntp_gettime() returns code 0 (OK)
time c8121481.1f3f8000 Sun, May 14 2006 21:41:37.122, (.122063),
maximum error 603208 us, estimated error 39319 us
ntp_adjtime() returns code 0 (OK)
modes 0x0 (),
offset 30763.000 us, frequency -432.622 ppm, interval 4 s,
maximum error 603208 us, estimated error 39319 us,
status 0x1 (PLL),
time constant 2, precision 1.000 us, tolerance 496 ppm,
pps frequency -1.251 ppm, stability 0.000 ppm, jitter 0.000 us,
intervals 0, jitter exceeded 0, stability exceeded 0, errors 0.
Initially ntptime was giving code 5 (error) & showing frequency 0 ppm.
At this, the offsets stayed reasonably constant. I had deleted the drift
file before starting so it would not be upset by the previous problems.
After it changed to code 0 & status 0x1, it started drifting badly and did a
'jump' of half a second at about 40 minutes.
Robert Jenkins.
More information about the questions
mailing list