[ntp:questions] Re: ntpq times out if NMEA refclock configured?

R Jenkins not at pub.lished
Sun May 14 21:03:50 UTC 2006


"Richard B. Gilbert" <rgilbert88 at comcast.net> wrote in message
news:9sqdnZb36qXzoPrZRVn-sg at comcast.com...
>R Jenkins wrote:
>> "Richard B. Gilbert" <rgilbert88 at comcast.net> wrote in message
>> news:L7udnQ5X_9grBvvZRVn-vA at comcast.com...
>>
>>>R Jenkins wrote:
>>>
>>>
>>>>Hi,
>>>>
>>>>I'm trying to add a GPS refclock to my server.
>>>>After total failure with a basic Trimble TSIP output GPS plus the parse
>>>>clock, I'm now using a Garmin GPS25 and the NMEA refclock.
>>>>
> <big snip>
>>>
>>>After rereading a little more carefully, I notice that your frequency
>>>correct of -495.9 PPM is on the ragged edge of the 500 PPM limit.  It is
>>>unusual for a clock to have a freqency error this large; most are below
>>>50 PPM in absolute value.
>>>
>>>Does your system have a kernel parameter called "HZ"?  Is it set to a
>>>value greater than 100?  I believe I have seen references to values of
>>>both 250 and 1000; neither value works well with NTPD.  The system seems
>>>to lose clock interrupts when HZ is greater than 100.  YMMV but if you
>>>are not using 100, give it a try.
>>>
>>
>> Hi,
>> thanks for the replies.
>>
>> The -495.9 ppm seems to be a symptom of the refclock problem. Without the
>> NMEA refclock it was -60 after a few minutes, long before it had settled
>> properly.
>> I think it does have a fast Hz setting (I've seen it somewhere but I
>> can't remember where or what it was set to..) However, it's a 3.2GHz
>> processor so I don't think it should struggle too much.
>>
>>
>> I have the PPS pulse set to 200mS.
>> The PC does not normally have a display, I use telnet (well, SSH) from my
>> desk.
>> Running minicom at 4800 Baud with NTPD stopped shows the GPS serial data
>> is present:
>> $GPRMC,073153,A,5319.0516,N,00106.9355,W,000.0,000.0,140506,004.0,W*76
>> $GPRMC,073154,A,5319.0516,N,00106.9355,W,000.0,000.0,140506,004.0,W*71
>> $GPRMC,073155,A,5319.0516,N,00106.9355,W,000.0,000.0,140506,004.0,W*70
>> ...
>> I'm not sure how to remotely monitor the DCD line.
>>
>>
>> Simply having the 'server 127.127.20.0 prefer' line in causes the ntpq
>> hang.
>>
>> I've just got around to checking the log immediately after starting ntpd:
>>
>> May 14 08:19:51 gate2 ntpd[28723]: ntpd 4.2.0a at 1.1190-r Sat May 13
>> 10:39:48 BST 2006 (1)
>> May 14 08:19:51 gate2 ntpd[28723]: precision = 1.000 usec
>> May 14 08:19:51 gate2 ntpd[28723]: Listening on interface wildcard,
>> 0.0.0.0#123
>> May 14 08:19:51 gate2 ntpd[28723]: Listening on interface wildcard,
>> ::#123
>> May 14 08:19:51 gate2 ntpd[28723]: Listening on interface lo,
>> 127.0.0.1#123
>> May 14 08:19:51 gate2 ntpd[28723]: Listening on interface eth0,
>> 192.168.0.43#123
>> <Other interfaces trimmed>
>> May 14 08:19:51 gate2 ntpd[28723]: kernel time sync status 0040
>> May 14 08:19:51 gate2 ntpd[28723]: refclock_nmea: time_pps_kcbind failed:
>> Invalid argument
>> May 14 08:19:52 gate2 ntpd[28723]: too many recvbufs allocated (40)
>>
>> It looks like there is some problem with the kernel PPS interface, but I
>> have no idea what...
>> I used this patch:
>> PPSkit-light-alpha-3328m-2.6.15.1.diff
>> on a clean download of kernel 2.6.16.9 - there were a couple of rejects,
>> but they seemed to be pretty obvious & went in easily by hand..
>>
>> I'm happy to try another (recent) 2.6 kernel if there is one with a known
>> working patch?
>>
>> Another test: Leaving the 'flag 3 1' out stops the refclock error line in
>> the log.
>> The 'too many recvbufs allocated (40)' line seems to be triggered by the
>> NMEA refclock regardless of any other settings; it does not appear when
>> the NMEA clock is commented out in ntp.conf
>>
>> Robert Jenkins.
>>
>>
>
>
> If the HZ setting is causing the problem, it has little to do with
> processor speed!!   The problem seems to be that various device drivers
> mask or disable interrupts for a period covering two or more clock
> interrupts causing one or more to be lost with each occurrence.
>
> The messages about "too many recvbufs allocated (40)" were associated with
> a bug in ntpd that I believe was fixed more than a year ago.  You might
> want to try the latest version of ntpd.  You can download it from
> http://ntp.isc.org/bin/view/Main/SoftwareDownloads

Hi,
I can understand the Hz setting messing up the accuracy, but I don't see it
would stop things running altogether?
It was at 250Hz, I'm presently compiling a kernel with it at 100Hz to see 
what effect this has.

I thought I had tried the latest dev release (as per my last post), but it 
turns out I had two copies of ntpd in different locations.
Using the ./configure options from the Redhat source does not put ntpd into 
the /usr/sbin directory as with their build, they must be patching the paths 
somewhere as well.

Having properly cleared the old files & rebuilt again, I am getting slightly 
better results, but it's still not locking to the GPS.

After around an hour:
# ntpq -c peers
     remote           refid      st t when poll reach   delay   offset 
jitter
==============================================================================
xGPS_NMEA(0)     .GPS.            0 l   18   64  377    0.000  -482.37 
160.720
+gate.jrw.intra  130.159.196.118  3 u    6   16  377    0.196  131.924 
19.370
*mail.alsys.ro   .GPS.            1 u   49   64  377   67.099   25.813 
69.507
+cronos.cenam.mx .GPS.            1 u   24   64  377  217.026  -10.000 
92.241

ntptime
ntp_gettime() returns code 0 (OK)
  time c8121481.1f3f8000  Sun, May 14 2006 21:41:37.122, (.122063),
  maximum error 603208 us, estimated error 39319 us
ntp_adjtime() returns code 0 (OK)
  modes 0x0 (),
  offset 30763.000 us, frequency -432.622 ppm, interval 4 s,
  maximum error 603208 us, estimated error 39319 us,
  status 0x1 (PLL),
  time constant 2, precision 1.000 us, tolerance 496 ppm,
  pps frequency -1.251 ppm, stability 0.000 ppm, jitter 0.000 us,
  intervals 0, jitter exceeded 0, stability exceeded 0, errors 0.

Initially ntptime was giving code 5 (error) & showing frequency 0 ppm.
At this, the offsets stayed reasonably constant. I had deleted the drift 
file before starting so it would not be upset by the previous problems.

After it changed to code 0 & status 0x1, it started drifting badly and did a 
'jump' of half a second at about 40 minutes.


Robert Jenkins.








More information about the questions mailing list