[ntp:bugs] [Bug 1851] New: ntpd 4.2.7 segfaults at startup under FreeBSD
bugzilla-daemon at ntp.org
bugzilla-daemon at ntp.org
Tue Mar 15 16:12:02 UTC 2011
https://bugs.ntp.org/show_bug.cgi?id=1851
Summary: ntpd 4.2.7 segfaults at startup under FreeBSD
Product: ntp
Version: 4.2.7
Platform: IBM
OS/Version: FreeBSD
Status: NEW
Severity: critical
Priority: P5
Component: ntpd
AssignedTo: stenn at ntp.org
ReportedBy: burnicki at ntp.org
CC: bugs at ntp.org
Estimated Hours: 0.0
I've recently started to play with FreeBSD for the first time and ported the
Linux driver for Meinberg PCI cards to FreeBSD.
During my tests I found that some ntpd v4.2.7* versions segfault when starting,
whereas ntpd 4.2.6p3 works properly. I've tried to use gdb to find the reason
for the segfault, but the results are strange and I'm really stumped.
Initially I played with ntpd 4.2.7p128 and found the daemon was up and running
every time after the machine had finished booting. However, when I killed the
running ntpd daemon and tried to start it again the daemon never started up
successfully, but each time I tried I found a new message in the syslog saying
"ntpd exited on signal 11 (core dumped)".
The backtrace from the core dump is always the same:
----------------------------------------------------
Core was generated by `ntpd'.
Program terminated with signal 11, Segmentation fault.
#0 0x0000000800fa8b54 in memset () from /lib/libc.so.7
(gdb) bt
#0 0x0000000800fa8b54 in memset () from /lib/libc.so.7
#1 0x0000000000413b95 in ntpdmain (argc=0, argv=0x7fffffffe9a0) at ntpd.c:980
#2 0x0000000000405e5e in _start ()
#3 0x00000008005b6000 in ?? ()
#4 0x0000000000000000 in ?? ()
#5 0x0000000000000000 in ?? ()
#6 0x0000000000000001 in ?? ()
(up to #50 displayed, even more to follow)
----------------------------------------------------
It's interesting here that there are more than 50 stack frames which are IMO
invalid and make me assume the stack might have been corrupted.
Next I tried ntpd 4.2.6p3 and found it starts up properly both at boot time or
when killed and re-started manually.
Finally I tried the most recent ntp-dev version 4.2.7p138. This version
segfaults both when starting at boot time and when started manually. However,
the backtraces are different than the backtrace from 4.2.7p128, and even the
backtraces from the boot-time start and from a manual start differ.
After booting the stack trace is always this one, and always the same:
-----------------------------------------------------
Core was generated by `ntpd'.
Program terminated with signal 11, Segmentation fault.
#0 0x0000000801088440 in freeaddrinfo () from /lib/libc.so.7
[New Thread 80140ae40 (LWP 100145)]
[New Thread 8014041c0 (LWP 100070)]
(gdb) bt
#0 0x0000000801088440 in freeaddrinfo () from /lib/libc.so.7
#1 0x0000000000451971 in blocking_getaddrinfo (c=0x801422180,
req=0x801458300) at ntp_intres.c:446
#2 0x0000000000451ec3 in blocking_child_common (c=0x801422180) at
ntp_worker.c:288
#3 0x00000000004541c9 in blocking_thread (ThreadArg=Variable "ThreadArg"
is not available.) at work_thread.c:597
#4 0x0000000800ebe511 in pthread_getprio () from /lib/libthr.so.3
#5 0x0000000000000000 in ?? ()
Cannot access memory at address 0x7fffffbff000
-----------------------------------------------------
However, when starting the same ntpd from the command line the stack trace is
always as below, and always the same, but different than the stack trace above:
-----------------------------------------------------
Core was generated by `ntpd'.
Program terminated with signal 11, Segmentation fault.
#0 initialize_action () at ./../lib/isc/unix/net.c:205
205 ipv4_result = try_proto(PF_INET);
[New Thread 8014041c0 (LWP 100258)]
(gdb) bt
#0 initialize_action () at ./../lib/isc/unix/net.c:205
#1 0x0000000800ec58f8 in pthread_once () from /lib/libthr.so.3
#2 0x0000000000455083 in initialize () at ./../lib/isc/unix/net.c:220
#3 0x00000000004551b9 in isc_net_probeipv4 () at ./../lib/isc/unix/net.c:225
#4 0x000000000044eb83 in init_lib () at lib_strbuf.c:31
#5 0x0000000000453129 in ssl_init () at ssl_init.c:26
#6 0x0000000000413f14 in ntpdmain (argc=0, argv=0x7fffffffe9a0) at ntpd.c:781
#7 0x000000000040649e in _start ()
#8 0x00000008005b7000 in ?? ()
#9 0x0000000000000000 in ?? ()
#10 0x0000000000000000 in ?? ()
#11 0x0000000000000001 in ?? ()
#12 0x00007fffffffece8 in ?? ()
(up to #50 displayed, even more to follow)
------------------------------------------
This happens under FreeBSD 8.1/amd64 on a Intel i7 system with quadcore CPU,
but this happens also if only a single core is enabled and hyperthreading is
disabled.
For these tests the ntp.conf file contains just a single server line, plus some
lines to enable statistics files. The new PCI card driver mentioned above is
neither used, nor even loaded.
All ntpd versions have been built from the tarballs downloaded from ntp.org,
simply using the commands "./configure; make". The resulting ntpd binary has
been copied to /usr/sbin/ to make sure the expected version is started at boot
time.
Also, none of the ntpd versions segfaults when I start it with debug option,
i.e. typing "ntpd -d", so maybe the fork causes the problem, or the different
timing with debug output unintentionally circumvents the reason for the
problem.
What is really strange here is that the backtraces are different, so I doubt
they point to the real cause of the segfault. Instead I think the segfault
happens accidentally here because some other piece of code corrupts the stack
or some variables.
Any hints how I could find out more are welcome. I'm not very familiar with
gdb/ddd, e.g. I don't know, yet, how to set a breakpoint in a way that the
*forked* process is stopped when the breakpoint is hit (if that's possible at
all).
Martin
--
Configure bugmail: https://bugs.ntp.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.
More information about the bugs
mailing list