[ntp:hackers] Dull Blade

Brian Utterback brian.utterback at sun.com
Mon Jul 25 12:13:09 PDT 2005


Not sure what you mean by "pulled the modem chip". I don't think that
the issue is with the chip at all. As I said, even parity problems could 
cause this. I stopped testing directly on Deacon when you started up a
ntpd from /usr/local. I figured it's anti-social to step on one's host's
testing.

However, I am trying to cobble together a D script that would verify the
source (either parity or other) of the break signal. I could let that
run and it would not be obtrusive.

David L. Mills wrote:
> Brian,
> 
> The modem chip shouldn't send a break, but it might send a INTR or QUIT. 
> Indeed, the termiio man page says BRKINT is set by default. But, none of 
> this should matter, as I have pulled the modem chip.
> 
> Dave
> 
> Brian Utterback wrote:
> 
>> I have confirmed that the break is coming from the PPS line. Looking
>> at the su driver code, it appears that a parity error can also 
>> generate a break signal. That may explain more about the when.
>>
>> Brian Utterback wrote:
>>
>>> A bit more dtracing reveals that while the refclock stream has IGNBRK 
>>> set, the stream that connects to the PPS signal does not. I will
>>> try to confirm, but my guess is that the break signal comes in on 
>>> that line. My idea why or when. Will let you know.
>>>
>>> Brian Utterback wrote:
>>>
>>>> I just checked the code for ldterm:
>>>>
>>>> http://cvs.opensolaris.org/source/xref/usr/src/uts/common/io/ldterm.c#1184 
>>>>
>>>>
>>>> Apparently, you can also get a SIGINT from a BRK indication, unless 
>>>> IGNBRK option was set on the serial line. This would occur if there was
>>>> a power cycle or disconnect on the serial line. The location in ldterm
>>>> where the signal was generated does indeed correspond to this line of
>>>> code in the traceback, so we have a smoking gun. Alas, the gun should
>>>> have been loaded with blanks, since it appears that ignbrk is set by
>>>> refclock_setup.
>>>>
>>>>
>>>> Brian Utterback wrote:
>>>>
>>>>> I have been mucking about on deacon with dtrace, and have discovered
>>>>> what is causing the exit. Apparently, one of the two refclocks on
>>>>> deacon send something on the serial port that the ldterm stream module
>>>>> interprets as an interrupt character. It then dutifully sends a
>>>>> SIGINT signal to ntpd, which likewise dutifully exits. I have no idea
>>>>> why one binary would be affected and not the other.
>>>>>
>>>>> This suggests that:
>>>>>
>>>>> 1. The serial lines have all such control character processing
>>>>> turned off.
>>>>>
>>>>> 2. That most signals log some info to the log before causing ntpd
>>>>> to take the long dive.
>>>>>
>>>>> David L. Mills wrote:
>>>>>
>>>>>> Brian,
>>>>>>
>>>>>> Thanks for the tip. I took a look at the dtrace man page and 
>>>>>> quickly drowned. Seems as I need to read a lot of stuff. I'm neck 
>>>>>> deep in the book just now and will have to get back to it sometime 
>>>>>> after the manuscript deadline in September. Meanwhile, deacon will 
>>>>>> just have to coast and I'll run ntpdate from time to time.
>>>>>>
>>>>>> Dave
>>>>>>
>>>>>> Brian Utterback wrote:
>>>>>>
>>>>>>> This is just the kind of thing that dtrace was made for. Do you 
>>>>>>> get a core or anything? If not, it should
>>>>>>> be fairly simple to make a dtrace script that stops ntpd just 
>>>>>>> before it exits to get a stack.
>>>>>>>
>>>>>>> David L. Mills wrote:
>>>>>>>
>>>>>>>> Guys,
>>>>>>>>
>>>>>>>> I just did a complete rebuild from scratch on the backroom 
>>>>>>>> machines after finding suspicious behavior possibly due to 
>>>>>>>> Solaris 10 upgrade. It went well on all the Solaris and FreeBSD 
>>>>>>>> machines except Solaris Blade 1500 deacon. I didn't change 
>>>>>>>> anything in the sources and the previous build Solaris 9 worked 
>>>>>>>> fine. However, and only on the Blade, the ntpd starts apparently 
>>>>>>>> successfully and then dies anywhere from a few seconds to 
>>>>>>>> several hours later with nothing in the log. This behavior 
>>>>>>>> happens only when the control terminal is detached. It runs 
>>>>>>>> forever under gdb and with the debug trace turned on.
>>>>>>>>
>>>>>>>> This behavior is not new. It has happened on several occasions 
>>>>>>>> with Linux. On previous occasions the problem went away by 
>>>>>>>> itself as sources were wiggled in various ways.


-- 
blu

Remember when SOX compliant meant they were both the same color?
----------------------------------------------------------------------
Brian Utterback - OP/N1 RPE, Sun Microsystems, Inc.
Ph:877-259-7345, Em:brian.utterback-at-ess-you-enn-dot-kom



More information about the hackers mailing list