[ntp:questions] Ntpd in uninterruptible sleep?

Dave Hart hart at ntp.org
Thu Nov 17 08:42:00 UTC 2011


On Sat, Nov 12, 2011 at 05:52, unruh <unruh at invalid.ca> wrote:
> On 2011-11-12, Dave Hart <hart at ntp.org> wrote:
>> On Fri, Nov 11, 2011 at 20:23, A C <agcarver+ntp at acarver.net> wrote:
>>> First attempt with gdb and a back trace after attaching gdb to the hung
>>> process (note this particular running of ntpd was not using the debug
>>> command line options):
>>>
>>>> #0 ?0x103d1458 in .umul () from /usr/lib/libc.so.12
>>>> #1 ?0x103c38d4 in __pow5mult_D2A () from /usr/lib/libc.so.12
>>>> #2 ?0x103c3ac4 in __muldi3 () from /usr/lib/libc.so.12
>>>> #3 ?0x103c34dc in __mult_D2A () from /usr/lib/libc.so.12
>>>> #4 ?0x103c3728 in __pow5mult_D2A () from /usr/lib/libc.so.12
>>>> #5 ?0x103b61d4 in __dtoa () from /usr/lib/libc.so.12
>>>> #6 ?0x103b315c in __vfprintf_unlocked () from /usr/lib/libc.so.12
>>>> #7 ?0x103230c4 in snprintf () from /usr/lib/libc.so.12
>>>> #8 ?0x00023afc in ctl_putarray (tag=<value optimized out>, arr=0xa8fe0,
>>>> start=1)
>>>> ? ?at ntp_control.c:1307
>>>> #9 ?0x00024a7c in ctl_putpeer (varid=30, peer=0xa8e70) at
>>>> ntp_control.c:1777
>>>> #10 0x0002744c in read_variables (rbufp=0x1050d000, restrict_mask=0) at
>>>> ntp_control.c:2334
>>>> #11 0x0002664c in process_control (rbufp=0x1050d000, restrict_mask=0) at
>>>> ntp_control.c:809
>>>> #12 0x00035594 in receive (rbufp=0x1050d000) at ntp_proto.c:370
>>>> #13 0x00022c00 in ntpdmain (argc=<value optimized out>, argv=<value
>>>> optimized out>) at ntpd.c:1150
>>>> #14 0x0001381c in ___start ()
>>>> #15 0x00013754 in _start ()
>>
>> Excellent.  I assume the stack trace is from ntpd 4.2.6p3.  I think
>> you've found a bug in your system's libc dtoa() exposed by its
>> snprintf(s, " %.2f", ...).  I believe you will not be able to
>> reproduce the bug using 4.2.7, as that version of ntpd uses
>> C99-snprintf [1] if the system snprintf() is not C99-compliant.
>> C99-snprintf's rpl_vsnprintf() does not use dtoa(), it hand-rolls the
>
> That seems like a step backwards. Having a program write its own code
> for  such elementary routines is a recipie for choas. Rather, if there
> is a bug  in libc it should be fixed.

For the original poster, this step backwards worked around the
problem.  For the benefit of threaded archives, acarver reported in
another thread [1][2][3] that using ntpd 4.2.7 configured with
--enable-c99-snprintf to force the use of C99-snprintf's replacement
code has made the problem disappear, apparently confirming the problem
is in the NetBSD sparc32 dtoa().

Agreed the bug in the C runtime should be fixed.  It turned out having
an alternative implementation of snprintf() easily available made it
easy to confirm the bug is in the C runtime, though.

[1] http://lists.ntp.org/pipermail/questions/2011-November/030983.html
-or-
[2] http://groups.google.com/group/comp.protocols.time.ntp/msg/8e18eba15c70db7f
-or-
[3] comp.protocols.time.ntp Message-ID: <4EC4B838.5000907 at acarver.net>

Cheers,
Dave Hart


More information about the questions mailing list