[ntp:questions] refclock use causes core dump of ntpd
wa6zvp at gmail.com
Thu Feb 22 17:39:39 UTC 2007
On Feb 22, 7:04 am, Ronan Flood <use... at umbral.org.uk> wrote:
> "wa6zvp" <wa6... at gmail.com> wrote:
> > > > We know it is from the call to abort() at line 788 of refclock_true.c.
> > > Yep. Unfortunately, the code should never get there. Yea, right.
> > OK, I got warm and cuddly with gdb, at least enough to set some
> > breakpoints
> > and look at variables.
> > The main culprit looks like line 540 (in refclock_true). This is in
> > the
> > received data function. It calls true_doevent with a parameter of
> > e_Poll.
> > Event e_Poll is never handled anywhere in doevent, so is very state
> > dependant.
> > Even replacing line 788, the original abort call location with a
> > break;,
> > the program would abort at other unhandled places in doevent.
> That's more understandable, but looking at the code I don't see how it
> got to line 788, since that's the default on a switch(up->type) which
> should only ever be one of t_unknown, t_goes, t_omega, t_tm, or t_tcu
> as they are the only values ever assigned to it, and they all have
> matching cases in the switch. What was the value of up->type when
> it got to line 788? And up->state?
* My recollection is that up->type actually had t_unknown in it,
it even more puzzling. Don't remember state.
> What I'd expect is that the state machine starts with t_unknown and
> s_Base then sees e_Init, from true_start() lines 290-292, which takes
> it into ss_InqGOES. If it then gets e_Poll from true_receive(), it
> would abort at line 726. Various other scenarios I have not looked
> at exhaustively, but getting to line 788 is puzzling ...
* It certainly is. I'll fiddle with more gdb tonight, maybe doing
instruction tracing from true_recieve.
I can't do much from work, since I can't disconnect the serial data
If I start ntpd with gdb, it just says 'normal completion', meanwhile
the forked process crashes. Is there a way to get gdb to follow into
the forked process? If not, I have to get it running without the data
attach to the running process. This will have to wait till tonight.
My feeling is that a refclock driver should _never_ cause ntpd to die.
I think it should just do verbose debugging and continue on as best it
The fact that it never gets into a reached status would be a clue that
its not working right. In this case, however, continuing makes it
This happens because the serial data is actually parsed correctly.
More information about the questions