[ntp:hackers] Dull Blade

David L. Mills mills at udel.edu
Mon Jul 25 11:33:17 PDT 2005


I did set IGNBRK explicitly on all refclock_open() calls, but the 
problem here is the ordinary Unix open(), which uses the default 
parameters. As the port can be and usually is associated with another 
driver, the atom driver can't fiddle anything without imposing nasty 
order restrictions in the confiiguration file. The atom driver can be 
used alone or with a parent driver on the same or another port or the 
atom driver can be used on a parallel port with or without a parent 
driver. I still believe a warning label on the bottle is the best solution.


Brian Utterback wrote:

> I just checked the code for ldterm:
> http://cvs.opensolaris.org/source/xref/usr/src/uts/common/io/ldterm.c#1184 
> Apparently, you can also get a SIGINT from a BRK indication, unless 
> IGNBRK option was set on the serial line. This would occur if there was
> a power cycle or disconnect on the serial line. The location in ldterm
> where the signal was generated does indeed correspond to this line of
> code in the traceback, so we have a smoking gun. Alas, the gun should
> have been loaded with blanks, since it appears that ignbrk is set by
> refclock_setup.
> Brian Utterback wrote:
>> I have been mucking about on deacon with dtrace, and have discovered
>> what is causing the exit. Apparently, one of the two refclocks on
>> deacon send something on the serial port that the ldterm stream module
>> interprets as an interrupt character. It then dutifully sends a
>> SIGINT signal to ntpd, which likewise dutifully exits. I have no idea
>> why one binary would be affected and not the other.
>> This suggests that:
>> 1. The serial lines have all such control character processing
>> turned off.
>> 2. That most signals log some info to the log before causing ntpd
>> to take the long dive.
>> David L. Mills wrote:
>>> Brian,
>>> Thanks for the tip. I took a look at the dtrace man page and quickly 
>>> drowned. Seems as I need to read a lot of stuff. I'm neck deep in 
>>> the book just now and will have to get back to it sometime after the 
>>> manuscript deadline in September. Meanwhile, deacon will just have 
>>> to coast and I'll run ntpdate from time to time.
>>> Dave
>>> Brian Utterback wrote:
>>>> This is just the kind of thing that dtrace was made for. Do you get 
>>>> a core or anything? If not, it should
>>>> be fairly simple to make a dtrace script that stops ntpd just 
>>>> before it exits to get a stack.
>>>> David L. Mills wrote:
>>>>> Guys,
>>>>> I just did a complete rebuild from scratch on the backroom 
>>>>> machines after finding suspicious behavior possibly due to Solaris 
>>>>> 10 upgrade. It went well on all the Solaris and FreeBSD machines 
>>>>> except Solaris Blade 1500 deacon. I didn't change anything in the 
>>>>> sources and the previous build Solaris 9 worked fine. However, and 
>>>>> only on the Blade, the ntpd starts apparently successfully and 
>>>>> then dies anywhere from a few seconds to several hours later with 
>>>>> nothing in the log. This behavior happens only when the control 
>>>>> terminal is detached. It runs forever under gdb and with the debug 
>>>>> trace turned on.
>>>>> This behavior is not new. It has happened on several occasions 
>>>>> with Linux. On previous occasions the problem went away by itself 
>>>>> as sources were wiggled in various ways.

More information about the hackers mailing list