[ntp:hackers] Dull Blade

David L. Mills mills at udel.edu
Mon Jul 25 22:05:08 PDT 2005


Brian,

What I meant by pulling the modem chip is that I removed the chip that
produced the noisy signal and perhaps those nasty characters that drove
the serial port to madness. I don't think then or now that it generated
a break signal. That requires genuine inginuity on a bare silicon UART.

Dave

Brian Utterback wrote:

> Not sure what you mean by "pulled the modem chip". I don't think that
> the issue is with the chip at all. As I said, even parity problems
> could cause this. I stopped testing directly on Deacon when you
> started up a
> ntpd from /usr/local. I figured it's anti-social to step on one's host's
> testing.
>
> However, I am trying to cobble together a D script that would verify the
> source (either parity or other) of the break signal. I could let that
> run and it would not be obtrusive.
>
> David L. Mills wrote:
>
>> Brian,
>>
>> The modem chip shouldn't send a break, but it might send a INTR or
>> QUIT. Indeed, the termiio man page says BRKINT is set by default.
>> But, none of this should matter, as I have pulled the modem chip.
>>
>> Dave
>>
>> Brian Utterback wrote:
>>
>>> I have confirmed that the break is coming from the PPS line. Looking
>>> at the su driver code, it appears that a parity error can also
>>> generate a break signal. That may explain more about the when.
>>>
>>> Brian Utterback wrote:
>>>
>>>> A bit more dtracing reveals that while the refclock stream has
>>>> IGNBRK set, the stream that connects to the PPS signal does not. I
>>>> will
>>>> try to confirm, but my guess is that the break signal comes in on
>>>> that line. My idea why or when. Will let you know.
>>>>
>>>> Brian Utterback wrote:
>>>>
>>>>> I just checked the code for ldterm:
>>>>>
>>>>> http://cvs.opensolaris.org/source/xref/usr/src/uts/common/io/ldterm.c#1184
>>>>>
>>>>>
>>>>> Apparently, you can also get a SIGINT from a BRK indication,
>>>>> unless IGNBRK option was set on the serial line. This would occur
>>>>> if there was
>>>>> a power cycle or disconnect on the serial line. The location in
>>>>> ldterm
>>>>> where the signal was generated does indeed correspond to this line of
>>>>> code in the traceback, so we have a smoking gun. Alas, the gun should
>>>>> have been loaded with blanks, since it appears that ignbrk is set by
>>>>> refclock_setup.
>>>>>
>>>>>
>>>>> Brian Utterback wrote:
>>>>>
>>>>>> I have been mucking about on deacon with dtrace, and have discovered
>>>>>> what is causing the exit. Apparently, one of the two refclocks on
>>>>>> deacon send something on the serial port that the ldterm stream
>>>>>> module
>>>>>> interprets as an interrupt character. It then dutifully sends a
>>>>>> SIGINT signal to ntpd, which likewise dutifully exits. I have no
>>>>>> idea
>>>>>> why one binary would be affected and not the other.
>>>>>>
>>>>>> This suggests that:
>>>>>>
>>>>>> 1. The serial lines have all such control character processing
>>>>>> turned off.
>>>>>>
>>>>>> 2. That most signals log some info to the log before causing ntpd
>>>>>> to take the long dive.
>>>>>>
>>>>>> David L. Mills wrote:
>>>>>>
>>>>>>> Brian,
>>>>>>>
>>>>>>> Thanks for the tip. I took a look at the dtrace man page and
>>>>>>> quickly drowned. Seems as I need to read a lot of stuff. I'm
>>>>>>> neck deep in the book just now and will have to get back to it
>>>>>>> sometime after the manuscript deadline in September. Meanwhile,
>>>>>>> deacon will just have to coast and I'll run ntpdate from time to
>>>>>>> time.
>>>>>>>
>>>>>>> Dave
>>>>>>>
>>>>>>> Brian Utterback wrote:
>>>>>>>
>>>>>>>> This is just the kind of thing that dtrace was made for. Do you
>>>>>>>> get a core or anything? If not, it should
>>>>>>>> be fairly simple to make a dtrace script that stops ntpd just
>>>>>>>> before it exits to get a stack.
>>>>>>>>
>>>>>>>> David L. Mills wrote:
>>>>>>>>
>>>>>>>>> Guys,
>>>>>>>>>
>>>>>>>>> I just did a complete rebuild from scratch on the backroom
>>>>>>>>> machines after finding suspicious behavior possibly due to
>>>>>>>>> Solaris 10 upgrade. It went well on all the Solaris and
>>>>>>>>> FreeBSD machines except Solaris Blade 1500 deacon. I didn't
>>>>>>>>> change anything in the sources and the previous build Solaris
>>>>>>>>> 9 worked fine. However, and only on the Blade, the ntpd starts
>>>>>>>>> apparently successfully and then dies anywhere from a few
>>>>>>>>> seconds to several hours later with nothing in the log. This
>>>>>>>>> behavior happens only when the control terminal is detached.
>>>>>>>>> It runs forever under gdb and with the debug trace turned on.
>>>>>>>>>
>>>>>>>>> This behavior is not new. It has happened on several occasions
>>>>>>>>> with Linux. On previous occasions the problem went away by
>>>>>>>>> itself as sources were wiggled in various ways.
>>>>>>>>
>
>




More information about the hackers mailing list