[ntp:questions] refclock use causes core dump of ntpd

wa6zvp wa6zvp at gmail.com
Thu Feb 22 17:39:39 UTC 2007


On Feb 22, 7:04 am, Ronan Flood <use... at umbral.org.uk> wrote:
> "wa6zvp" <wa6... at gmail.com> wrote:
> > > > We know it is from the call to abort() at line 788 of refclock_true.c.
>
> > > Yep.  Unfortunately, the code should never get there.  Yea, right.
>
> > OK, I got warm and cuddly with gdb, at least enough to set some
> > breakpoints
> > and look at variables.
>
> > The main culprit looks like line 540 (in refclock_true).  This is in
> > the
> > received data function.  It calls true_doevent with a parameter of
> > e_Poll.
> > Event e_Poll is never handled anywhere in doevent, so is very state
> > dependant.
>
> > Even replacing line 788, the original abort call location with a
> > break;,
> > the program would abort at other unhandled places in doevent.
>
> That's more understandable, but looking at the code I don't see how it
> got to line 788, since that's the default on a switch(up->type) which
> should only ever be one of t_unknown, t_goes, t_omega, t_tm, or t_tcu
> as they are the only values ever assigned to it, and they all have
> matching cases in the switch.  What was the value of up->type when
> it got to line 788?  And up->state?

* My recollection is that up->type actually had t_unknown in it,
making
it even more puzzling.  Don't remember state.

> What I'd expect is that the state machine starts with t_unknown and
> s_Base then sees e_Init, from true_start() lines 290-292, which takes
> it into ss_InqGOES.  If it then gets e_Poll from true_receive(), it
> would abort at line 726.  Various other scenarios I have not looked
> at exhaustively, but getting to line 788 is puzzling ...

* It certainly is.  I'll fiddle with more gdb tonight, maybe doing
some
instruction tracing from true_recieve.

I can't do much from work, since I can't disconnect the serial data
line.
If I start ntpd with gdb, it just says 'normal completion', meanwhile
the forked process crashes.  Is there a way to get gdb to follow into
the forked process?  If not, I have to get it running without the data
and
attach to the running process. This will have to wait till tonight.

My feeling is that a refclock driver should _never_ cause ntpd to die.
I think it should just do verbose debugging and continue on as best it
can.
The fact that it never gets into a reached status would be a clue that
its not working right.  In this case, however, continuing makes it
work.
This happens because the serial data is actually parsed correctly.

More later.

Roger






More information about the questions mailing list