[ntp:hackers] Dull Blade

Brian Utterback brian.utterback at sun.com
Mon Jul 25 10:04:51 PDT 2005


I have confirmed that the break is coming from the PPS line. Looking
at the su driver code, it appears that a parity error can also generate 
a break signal. That may explain more about the when.

Brian Utterback wrote:
> A bit more dtracing reveals that while the refclock stream has IGNBRK 
> set, the stream that connects to the PPS signal does not. I will
> try to confirm, but my guess is that the break signal comes in on that 
> line. My idea why or when. Will let you know.
> 
> Brian Utterback wrote:
> 
>> I just checked the code for ldterm:
>>
>> http://cvs.opensolaris.org/source/xref/usr/src/uts/common/io/ldterm.c#1184 
>>
>>
>> Apparently, you can also get a SIGINT from a BRK indication, unless 
>> IGNBRK option was set on the serial line. This would occur if there was
>> a power cycle or disconnect on the serial line. The location in ldterm
>> where the signal was generated does indeed correspond to this line of
>> code in the traceback, so we have a smoking gun. Alas, the gun should
>> have been loaded with blanks, since it appears that ignbrk is set by
>> refclock_setup.
>>
>>
>> Brian Utterback wrote:
>>
>>> I have been mucking about on deacon with dtrace, and have discovered
>>> what is causing the exit. Apparently, one of the two refclocks on
>>> deacon send something on the serial port that the ldterm stream module
>>> interprets as an interrupt character. It then dutifully sends a
>>> SIGINT signal to ntpd, which likewise dutifully exits. I have no idea
>>> why one binary would be affected and not the other.
>>>
>>> This suggests that:
>>>
>>> 1. The serial lines have all such control character processing
>>> turned off.
>>>
>>> 2. That most signals log some info to the log before causing ntpd
>>> to take the long dive.
>>>
>>> David L. Mills wrote:
>>>
>>>> Brian,
>>>>
>>>> Thanks for the tip. I took a look at the dtrace man page and quickly 
>>>> drowned. Seems as I need to read a lot of stuff. I'm neck deep in 
>>>> the book just now and will have to get back to it sometime after the 
>>>> manuscript deadline in September. Meanwhile, deacon will just have 
>>>> to coast and I'll run ntpdate from time to time.
>>>>
>>>> Dave
>>>>
>>>> Brian Utterback wrote:
>>>>
>>>>> This is just the kind of thing that dtrace was made for. Do you get 
>>>>> a core or anything? If not, it should
>>>>> be fairly simple to make a dtrace script that stops ntpd just 
>>>>> before it exits to get a stack.
>>>>>
>>>>> David L. Mills wrote:
>>>>>
>>>>>> Guys,
>>>>>>
>>>>>> I just did a complete rebuild from scratch on the backroom 
>>>>>> machines after finding suspicious behavior possibly due to Solaris 
>>>>>> 10 upgrade. It went well on all the Solaris and FreeBSD machines 
>>>>>> except Solaris Blade 1500 deacon. I didn't change anything in the 
>>>>>> sources and the previous build Solaris 9 worked fine. However, and 
>>>>>> only on the Blade, the ntpd starts apparently successfully and 
>>>>>> then dies anywhere from a few seconds to several hours later with 
>>>>>> nothing in the log. This behavior happens only when the control 
>>>>>> terminal is detached. It runs forever under gdb and with the debug 
>>>>>> trace turned on.
>>>>>>
>>>>>> This behavior is not new. It has happened on several occasions 
>>>>>> with Linux. On previous occasions the problem went away by itself 
>>>>>> as sources were wiggled in various ways.
>>>
>>>
>>>
>>>
>>>
>>
>>
> 
> 


-- 
blu

Remember when SOX compliant meant they were both the same color?
----------------------------------------------------------------------
Brian Utterback - OP/N1 RPE, Sun Microsystems, Inc.
Ph:877-259-7345, Em:brian.utterback-at-ess-you-enn-dot-kom



More information about the hackers mailing list