[ntp:hackers] Dull Blade
David L. Mills
mills at udel.edu
Mon Jul 25 11:09:06 PDT 2005
Deacon has two device drivers, a GPS clock uisng dull ASCII string and
first serial portand the atom driver using a DCD transition captured in
the kernel and PPSAPI interface on the second serial port. The data
leads on the second serial port are not used, but happen to be connected
to a UART chip sometimes used with the CHU radio signal. The modem chip
sometimes goes crazy and burbles garbage to the data leads and
presumably the garbage could include INTR or QUIT characters, but this
has not been a problem in the several years and machines and Solaris
versions this condition has endured. I pulled the chip and deacon is
If this is in fact the problem, others might stumble over it as well,
but all workarounds have hazards. Most folks will use the same port for
both the PPS and data signal; I was too lazy to wire a special
connector. The atom driver can't fiddle the termios bits because the
parent driver might have other ideas. Best to announce in the
documentation that, if the DCD signal is in use, either connect the data
leads to a valid source or leave them disconnected.
Thanks very much for your sweat. I would never have found the problem
Brian Utterback wrote:
> I have been mucking about on deacon with dtrace, and have discovered
> what is causing the exit. Apparently, one of the two refclocks on
> deacon send something on the serial port that the ldterm stream module
> interprets as an interrupt character. It then dutifully sends a
> SIGINT signal to ntpd, which likewise dutifully exits. I have no idea
> why one binary would be affected and not the other.
> This suggests that:
> 1. The serial lines have all such control character processing
> turned off.
> 2. That most signals log some info to the log before causing ntpd
> to take the long dive.
> David L. Mills wrote:
>> Thanks for the tip. I took a look at the dtrace man page and quickly
>> drowned. Seems as I need to read a lot of stuff. I'm neck deep in the
>> book just now and will have to get back to it sometime after the
>> manuscript deadline in September. Meanwhile, deacon will just have to
>> coast and I'll run ntpdate from time to time.
>> Brian Utterback wrote:
>>> This is just the kind of thing that dtrace was made for. Do you get
>>> a core or anything? If not, it should
>>> be fairly simple to make a dtrace script that stops ntpd just before
>>> it exits to get a stack.
>>> David L. Mills wrote:
>>>> I just did a complete rebuild from scratch on the backroom machines
>>>> after finding suspicious behavior possibly due to Solaris 10
>>>> upgrade. It went well on all the Solaris and FreeBSD machines
>>>> except Solaris Blade 1500 deacon. I didn't change anything in the
>>>> sources and the previous build Solaris 9 worked fine. However, and
>>>> only on the Blade, the ntpd starts apparently successfully and then
>>>> dies anywhere from a few seconds to several hours later with
>>>> nothing in the log. This behavior happens only when the control
>>>> terminal is detached. It runs forever under gdb and with the debug
>>>> trace turned on.
>>>> This behavior is not new. It has happened on several occasions with
>>>> Linux. On previous occasions the problem went away by itself as
>>>> sources were wiggled in various ways.
More information about the hackers