[ntp:questions] ntpd wedged again
agcarver+ntp at acarver.net
Sun Feb 12 07:55:30 UTC 2012
Since we seem to be going around a few times on this I'm going to
summarize the current hardware and software configuration of the system
so we're all on the same starting point.
The GPS data is read by gpsd on /dev/gps0 which is a symlink to /dev/ttya.
The PPS_ATOM is reading /dev/pps0 which is a symlink to /dev/ttyb.
So two serial ports handling two different parts of the available GPS
The GPS PPS signal is stable. I have kept an eye on it externally
including when I've had trouble with ntpd and I have not seen any
evidence of trouble with the source.
The OS is NetBSD 5.1 on sparc sun4c (IPX). The oscillator on the system
appears to be fine. It did not drift far from its set time with no
clock discipline. The system load is minimal, running only ntpd and
gpsd plus the standard base system processes.
The current clock configuration of ntpd has the following:
server 0.us.pool.ntp.org iburst
server 1.us.pool.ntp.org iburst
server 0.north-america.pool.ntp.org iburst
server 127.127.22.0 minpoll 4 maxpoll 4
fudge 127.127.22.0 flag2 1 flag3 0 flag4 1 refid PPS
server 127.127.28.0 minpoll 7 noselect
fudge 127.127.28.0 stratum 4 time1 -0.6 flag4 1 refid GPSD
Note that the SHM refclock (the GPSD) is set to noselect. It has been
this way for quite a while. All the problems I have been having do not
involve GPSD in any way because it is always noselect. I will be using
it later on ntpd but I want to get PPS working first. Until then it
will stay noselect. Unless the noselect directive is being ignored by
ntpd my understanding is that it shouldn't matter if the refclock is
present in the configuration.
I have configured PPS_ATOM one of three ways for testing. It has been
in noselect, flag 3 disabled, and flag 3 enabled.
Now, under dev version p239, PPS did in fact work fine for a short
period of time. After about an hour the system would sync to PPS and I
would get the 'o' code. But after some hours of use ntpd would spin out
of control with huge multi-second offsets and the clock would tick out
of control. This happened with either flag3 setting.
Now I have version p256 because of the snprintf bug in NetBSD's libc.
The configure flag --enable-c99-snprintf didn't work properly in p239
but does work in p256. I needed that flag to work around the libc bug.
Under this new version, PPS does not work AT ALL. At no time does it
ever sync to PPS regardless of the flag3 setting. More interesting is
the fact that p256 reports twice the jitter on the PPS signal that p239
did even though none of the hardware or OS components have changed. The
only thing that's different is ntpd. With flag3 cleared (no kernel
discipline), p256 refuses to ever select PPS regardless of the quality
of its signal. It is always marked a false ticker. By comparison p239
didn't have this problem. Version p256 will spin out of control with
multi-second offsets if flag3 is enabled. This is the same behavior as
I am seeing many sys_fuzz messages in version p256 as well.
So that's where things sit right now.
More information about the questions