[ntp:questions] ntpd wedged again

A C agcarver+ntp at acarver.net
Sun Feb 12 07:55:30 UTC 2012

Since we seem to be going around a few times on this I'm going to 
summarize the current hardware and software configuration of the system 
so we're all on the same starting point.

The GPS data is read by gpsd on /dev/gps0 which is a symlink to /dev/ttya.

The PPS_ATOM is reading /dev/pps0 which is a symlink to /dev/ttyb.

So two serial ports handling two different parts of the available GPS 

The GPS PPS signal is stable. I have kept an eye on it externally 
including when I've had trouble with ntpd and I have not seen any 
evidence of trouble with the source.

The OS is NetBSD 5.1 on sparc sun4c (IPX).  The oscillator on the system 
appears to be fine.  It did not drift far from its set time with no 
clock discipline.  The system load is minimal, running only ntpd and 
gpsd plus the standard base system processes.

The current clock configuration of ntpd has the following:

server 0.us.pool.ntp.org iburst
server 1.us.pool.ntp.org iburst
server 0.north-america.pool.ntp.org iburst
server ntp1.gatech.edu
server rolex.usg.edu

server  minpoll 4 maxpoll 4
fudge  flag2 1 flag3 0 flag4 1 refid PPS

server  minpoll 7 noselect
fudge  stratum 4 time1 -0.6 flag4 1 refid GPSD

Note that the SHM refclock (the GPSD) is set to noselect.  It has been 
this way for quite a while.  All the problems I have been having do not 
involve GPSD in any way because it is always noselect. I will be using 
it later on ntpd but I want to get PPS working first.  Until then it 
will stay noselect.  Unless the noselect directive is being ignored by 
ntpd my understanding is that it shouldn't matter if the refclock is 
present in the configuration.

I have configured PPS_ATOM one of three ways for testing.  It has been 
in noselect, flag 3 disabled, and flag 3 enabled.

Now, under dev version p239, PPS did in fact work fine for a short 
period of time.  After about an hour the system would sync to PPS and I 
would get the 'o' code.  But after some hours of use ntpd would spin out 
of control with huge multi-second offsets and the clock would tick out 
of control.  This happened with either flag3 setting.

Now I have version p256 because of the snprintf bug in NetBSD's libc. 
The configure flag --enable-c99-snprintf didn't work properly in p239 
but does work in p256.  I needed that flag to work around the libc bug. 
  Under this new version, PPS does not work AT ALL.  At no time does it 
ever sync to PPS regardless of the flag3 setting.  More interesting is 
the fact that p256 reports twice the jitter on the PPS signal that p239 
did even though none of the hardware or OS components have changed.  The 
only thing that's different is ntpd.  With flag3 cleared (no kernel 
discipline), p256 refuses to ever select PPS regardless of the quality 
of its signal.  It is always marked a false ticker.  By comparison p239 
didn't have this problem.  Version p256 will spin out of control with 
multi-second offsets if flag3 is enabled.  This is the same behavior as 
version p239.

I am seeing many sys_fuzz messages in version p256 as well.

So that's where things sit right now.

More information about the questions mailing list