[ntp:hackers] Dull Blade

David L. Mills mills at udel.edu
Mon Jul 25 11:42:42 PDT 2005


Brian,

The modem chip shouldn't send a break, but it might send a INTR or QUIT. 
Indeed, the termiio man page says BRKINT is set by default. But, none of 
this should matter, as I have pulled the modem chip.

Dave

Brian Utterback wrote:

> I have confirmed that the break is coming from the PPS line. Looking
> at the su driver code, it appears that a parity error can also 
> generate a break signal. That may explain more about the when.
>
> Brian Utterback wrote:
>
>> A bit more dtracing reveals that while the refclock stream has IGNBRK 
>> set, the stream that connects to the PPS signal does not. I will
>> try to confirm, but my guess is that the break signal comes in on 
>> that line. My idea why or when. Will let you know.
>>
>> Brian Utterback wrote:
>>
>>> I just checked the code for ldterm:
>>>
>>> http://cvs.opensolaris.org/source/xref/usr/src/uts/common/io/ldterm.c#1184 
>>>
>>>
>>> Apparently, you can also get a SIGINT from a BRK indication, unless 
>>> IGNBRK option was set on the serial line. This would occur if there was
>>> a power cycle or disconnect on the serial line. The location in ldterm
>>> where the signal was generated does indeed correspond to this line of
>>> code in the traceback, so we have a smoking gun. Alas, the gun should
>>> have been loaded with blanks, since it appears that ignbrk is set by
>>> refclock_setup.
>>>
>>>
>>> Brian Utterback wrote:
>>>
>>>> I have been mucking about on deacon with dtrace, and have discovered
>>>> what is causing the exit. Apparently, one of the two refclocks on
>>>> deacon send something on the serial port that the ldterm stream module
>>>> interprets as an interrupt character. It then dutifully sends a
>>>> SIGINT signal to ntpd, which likewise dutifully exits. I have no idea
>>>> why one binary would be affected and not the other.
>>>>
>>>> This suggests that:
>>>>
>>>> 1. The serial lines have all such control character processing
>>>> turned off.
>>>>
>>>> 2. That most signals log some info to the log before causing ntpd
>>>> to take the long dive.
>>>>
>>>> David L. Mills wrote:
>>>>
>>>>> Brian,
>>>>>
>>>>> Thanks for the tip. I took a look at the dtrace man page and 
>>>>> quickly drowned. Seems as I need to read a lot of stuff. I'm neck 
>>>>> deep in the book just now and will have to get back to it sometime 
>>>>> after the manuscript deadline in September. Meanwhile, deacon will 
>>>>> just have to coast and I'll run ntpdate from time to time.
>>>>>
>>>>> Dave
>>>>>
>>>>> Brian Utterback wrote:
>>>>>
>>>>>> This is just the kind of thing that dtrace was made for. Do you 
>>>>>> get a core or anything? If not, it should
>>>>>> be fairly simple to make a dtrace script that stops ntpd just 
>>>>>> before it exits to get a stack.
>>>>>>
>>>>>> David L. Mills wrote:
>>>>>>
>>>>>>> Guys,
>>>>>>>
>>>>>>> I just did a complete rebuild from scratch on the backroom 
>>>>>>> machines after finding suspicious behavior possibly due to 
>>>>>>> Solaris 10 upgrade. It went well on all the Solaris and FreeBSD 
>>>>>>> machines except Solaris Blade 1500 deacon. I didn't change 
>>>>>>> anything in the sources and the previous build Solaris 9 worked 
>>>>>>> fine. However, and only on the Blade, the ntpd starts apparently 
>>>>>>> successfully and then dies anywhere from a few seconds to 
>>>>>>> several hours later with nothing in the log. This behavior 
>>>>>>> happens only when the control terminal is detached. It runs 
>>>>>>> forever under gdb and with the debug trace turned on.
>>>>>>>
>>>>>>> This behavior is not new. It has happened on several occasions 
>>>>>>> with Linux. On previous occasions the problem went away by 
>>>>>>> itself as sources were wiggled in various ways.
>>>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>
>>
>
>




More information about the hackers mailing list