[ntp:hackers] Dull Blade
brian.utterback at sun.com
Mon Jul 25 08:41:58 PDT 2005
A bit more dtracing reveals that while the refclock stream has IGNBRK
set, the stream that connects to the PPS signal does not. I will
try to confirm, but my guess is that the break signal comes in on that
line. My idea why or when. Will let you know.
Brian Utterback wrote:
> I just checked the code for ldterm:
> Apparently, you can also get a SIGINT from a BRK indication, unless
> IGNBRK option was set on the serial line. This would occur if there was
> a power cycle or disconnect on the serial line. The location in ldterm
> where the signal was generated does indeed correspond to this line of
> code in the traceback, so we have a smoking gun. Alas, the gun should
> have been loaded with blanks, since it appears that ignbrk is set by
> Brian Utterback wrote:
>> I have been mucking about on deacon with dtrace, and have discovered
>> what is causing the exit. Apparently, one of the two refclocks on
>> deacon send something on the serial port that the ldterm stream module
>> interprets as an interrupt character. It then dutifully sends a
>> SIGINT signal to ntpd, which likewise dutifully exits. I have no idea
>> why one binary would be affected and not the other.
>> This suggests that:
>> 1. The serial lines have all such control character processing
>> turned off.
>> 2. That most signals log some info to the log before causing ntpd
>> to take the long dive.
>> David L. Mills wrote:
>>> Thanks for the tip. I took a look at the dtrace man page and quickly
>>> drowned. Seems as I need to read a lot of stuff. I'm neck deep in the
>>> book just now and will have to get back to it sometime after the
>>> manuscript deadline in September. Meanwhile, deacon will just have to
>>> coast and I'll run ntpdate from time to time.
>>> Brian Utterback wrote:
>>>> This is just the kind of thing that dtrace was made for. Do you get
>>>> a core or anything? If not, it should
>>>> be fairly simple to make a dtrace script that stops ntpd just before
>>>> it exits to get a stack.
>>>> David L. Mills wrote:
>>>>> I just did a complete rebuild from scratch on the backroom machines
>>>>> after finding suspicious behavior possibly due to Solaris 10
>>>>> upgrade. It went well on all the Solaris and FreeBSD machines
>>>>> except Solaris Blade 1500 deacon. I didn't change anything in the
>>>>> sources and the previous build Solaris 9 worked fine. However, and
>>>>> only on the Blade, the ntpd starts apparently successfully and then
>>>>> dies anywhere from a few seconds to several hours later with
>>>>> nothing in the log. This behavior happens only when the control
>>>>> terminal is detached. It runs forever under gdb and with the debug
>>>>> trace turned on.
>>>>> This behavior is not new. It has happened on several occasions with
>>>>> Linux. On previous occasions the problem went away by itself as
>>>>> sources were wiggled in various ways.
Remember when SOX compliant meant they were both the same color?
Brian Utterback - OP/N1 RPE, Sun Microsystems, Inc.
More information about the hackers