[ntp:bugs] [Bug 1851] ntpd 4.2.7 segfaults at startup under FreeBSD
bugzilla-daemon at ntp.org
bugzilla-daemon at ntp.org
Wed Mar 16 14:28:35 UTC 2011
http://bugs.ntp.org/show_bug.cgi?id=1851
--- Comment #5 from Martin Burnicki <burnicki at ntp.org> 2011-03-16 14:28:35 UTC ---
(In reply to comment #3)
> Thanks for the followup, Martin. I should have been clearer in my
> suggestion to use CFLAGS='-g -O0'.
>
> CFLAGS="-g -O0" ./configure --disable-thread-support
>
> is mostly equivalent, but better is:
>
> rm -f config.cache ; ./configure -C --disable-thread-support CFLAGS="-g -O0"
> Even with an emptied cache, the -C saves time thanks to nesting of packages.
In fact, I had completely removed the previous directory tree and unpacked the
tarball once more to be sure there's a clean starting point, without some
cached settings.
> Putting CFLAGS overrides in the configure command line rather than in the
> environment exposes the fact of the override to configure, which then
> enforces it by warning if an attempt is made to reconfigure with different
> CFLAGS in effect and a leftover config.cache.
./configure --help said environment variables are honoured, so I just tried
this. And I ran "make V=1" to verify the compiler is indeed called with "-g
-O0".
Anyway, I retried once more with the configure command you suggested and as
expected, the backtraces are exactly the same.
> Getting back to the problem at hand, that's twice your stack traces have
> implicated the same pieces of code. For freeaddrinfo() from
> libntp/ntp_intres.c blocking_getaddrinfo(), the code seems correct to my
> eye. Perhaps something is corrupting the heap before the freeaddrinfo().
> A couple > of things you could try here to better understand the failure:
>
> 1) Valgrind is in FreeBSD ports, if it works anywhere near as well as on
> Linux it should catch heap corruption while barely working up a sweat.
I've never used valgrind before, but just gave it a short try:
[root at pc-martin6 /]# valgrind -v --leak-check=full ntpd
==1248== Memcheck, a memory error detector
==1248== Copyright (C) 2002-2009, and GNU GPL'd, by Julian Seward et al.
==1248== Using Valgrind-3.5.0 and LibVEX; rerun with -h for copyright info
==1248== Command: ntpd
==1248==
--1248-- Valgrind options:
--1248-- -v
--1248-- --leak-check=full
--1248-- Contents of /proc/version:
--1248-- Arch and hwcaps: AMD64, amd64-sse3-cx16
--1248-- Page sizes: currently 4096, max supported 4096
--1248-- Valgrind library directory: /usr/local/lib/valgrind
--1248-- Reading syms from /libexec/ld-elf.so.1 (0x10000)
--1248-- object doesn't have a symbol table
--1248-- Reading syms from /usr/sbin/ntpd (0x400000)
--1248-- Reading syms from /usr/local/lib/valgrind/memcheck-amd64-freebsd
(0x38000000)
--1248-- object doesn't have a symbol table
--1248-- object doesn't have a dynamic symbol table
--1248-- Reading suppressions file: /usr/local/lib/valgrind/default.supp
--1248-- Reading syms from
/usr/local/lib/valgrind/vgpreload_core-amd64-freebsd.so (0x156000)
--1248-- object doesn't have a symbol table
--1248-- Reading syms from
/usr/local/lib/valgrind/vgpreload_memcheck-amd64-freebsd.so (0x257000)
--1248-- object doesn't have a symbol table
--1248-- Reading syms from /lib/libmd.so.5 (0xe10000)
--1248-- object doesn't have a symbol table
--1248-- Reading syms from /lib/libm.so.5 (0xf1d000)
--1248-- object doesn't have a symbol table
--1248-- Reading syms from /lib/libcrypto.so.6 (0x103c000)
--1248-- object doesn't have a symbol table
--1248-- Reading syms from /lib/libkvm.so.5 (0x12d6000)
--1248-- object doesn't have a symbol table
--1248-- Reading syms from /usr/lib/libelf.so.1 (0x13de000)
--1248-- object doesn't have a symbol table
--1248-- Reading syms from /usr/lib/librt.so.1 (0x14f6000)
--1248-- object doesn't have a symbol table
--1248-- Reading syms from /lib/libc.so.7 (0x15fb000)
--1248-- object doesn't have a symbol table
--1248-- REDIR: 0x16e9f20 (strlen) redirected to 0x25b650 (strlen)
--1248-- REDIR: 0x1679980 (realloc) redirected to 0x25a6b0 (realloc)
--1248-- REDIR: 0x16eeb10 (memset) redirected to 0x25b9f0 (memset)
--1248-- REDIR: 0x16eeb70 (memcpy) redirected to 0x25ca90 (memcpy)
--1248-- REDIR: 0x169d050 (strrchr) redirected to 0x25b3c0 (strrchr)
--1248-- REDIR: 0x16e9ee0 (strncmp) redirected to 0x25b6b0 (strncmp)
--1248-- REDIR: 0x16e9670 (strcpy) redirected to 0x25d2b0 (strcpy)
--1248-- REDIR: 0x167a650 (free) redirected to 0x25a1b0 (free)
--1248-- REDIR: 0x1678080 (malloc) redirected to 0x25a5f0 (malloc)
--1248-- REDIR: 0x16d63f0 (strcat) redirected to 0x25c160 (strcat)
--1248-- REDIR: 0x16ea020 (strchr) redirected to 0x25b480 (strchr)
--1248-- REDIR: 0x16d6370 (strcmp) redirected to 0x25b800 (strcmp)
--1248-- REDIR: 0x16d5670 (strncpy) redirected to 0x25d020 (strncpy)
--1248-- REDIR: 0x16e6da0 (strnlen) redirected to 0x25b620 (strnlen)
==1248==
--1248-- WARNING: unhandled syscall: 235
==1248== HEAP SUMMARY:
==1248== in use at exit: 5 bytes in 1 blocks
==1248== total heap usage: 3 allocs, 2 frees, 1,129 bytes allocated
==1248==
==1248== at 0x1650EBC: __sys_ktimer_create (in /lib/libc.so.7)
==1248== Searching for pointers to 1 not-freed blocks
==1248== by 0x14F8DBF: timer_create (in /usr/lib/librt.so.1)
==1248== Checked 1,139,976 bytes
==1248== by 0x440A64: init_timer (ntp_timer.c:185)
==1248==
==1248== by 0x418C8A: ntpdmain (ntpd.c:786)
==1248== LEAK SUMMARY:
==1248== by 0x41849A: main (ntpd.c:290)
==1248== definitely lost: 0 bytes in 0 blocks
--1248-- You may be able to write your own handler.
==1248== indirectly lost: 0 bytes in 0 blocks
--1248-- Read the file README_MISSING_SYSCALL_OR_IOCTL.
==1248== possibly lost: 0 bytes in 0 blocks
--1248-- Nevertheless we consider this a bug. Please report
==1248== still reachable: 5 bytes in 1 blocks
--1248-- it at http://valgrind.org/support/bug_reports.html.
==1248== suppressed: 0 bytes in 0 blocks
==1248==
==1248== Reachable blocks (those to which a pointer was found) are not shown.
==1248== To see them, rerun with: --leak-check=full --show-reachable=yes
==1248==
==1248== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
==1248== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
[root at pc-martin6 /]# ==1248== HEAP SUMMARY:
==1248== in use at exit: 474 bytes in 11 blocks
==1248== total heap usage: 2,357 allocs, 2,346 frees, 83,676 bytes allocated
==1248==
==1248== Searching for pointers to 11 not-freed blocks
==1248== Checked 1,140,416 bytes
==1248==
==1248== LEAK SUMMARY:
==1248== definitely lost: 0 bytes in 0 blocks
==1248== indirectly lost: 0 bytes in 0 blocks
==1248== possibly lost: 0 bytes in 0 blocks
==1248== still reachable: 474 bytes in 11 blocks
==1248== suppressed: 0 bytes in 0 blocks
==1248== Reachable blocks (those to which a pointer was found) are not shown.
==1248== To see them, rerun with: --leak-check=full --show-reachable=yes
==1248==
==1248== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
==1248== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
> 2) Comment out the call to freeaddrinfo(), or use copy_addrinfo_list() and
> free the original immediately after copying it, to probe for code in that
> routine corrupting heap regardless of whether allocated as bits and pieces
> by getaddrinfo() or one big chunk by copy_addrinfo_list().
I can give this a try, too. However, I'll have to put this on a queue for today
since a few other tasks are still waiting.
> The crash in lib/isc/unix/net.c is also puzzling to me. The lib/isc code is
> largely unchanged from BIND 9 where it gets much wider exposure. Given the
> recent changes and they way a mutex is used to ensure the ipv4/ipv6 probing
> is done exactly once, my hunch is there's mutex abuse afoot thanks to my
> inadequate attention to threading detail.
I agree this is a very strange thing.
Martin
--
Configure bugmail: http://bugs.ntp.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.
More information about the bugs
mailing list