[ntp:questions] Ntpd in uninterruptible sleep?

A C agcarver+ntp at acarver.net
Wed Nov 9 20:44:10 UTC 2011


On 11/4/2011 18:40, Dave Hart wrote:

>> From your netbsd.org mailing list traffic, I believe you're using
> NetBSD 5.x.  Looking at ntpd/ntp_io.c, recvfrom() is not the call I'd
> expect to see happen, has NetBSD 5.x supports SO_TIMESTAMP, so #ifdef
> HAVE_TIMESTAMP code is active, and ntpd would typically use recvmsg()
> rather than recvfrom().  See read_network_packet() in ntpd/ntp_io.c  I
> say typically because if either
>
> 1.  the particular local address ("interface") to which the socket is
> bound is ignoring input (as ntpd's wildcard sockets do, and others can
> be configured to do via "interface ___ drop" in ntp.conf), or
> 2.  ntpd has no receive buffers available
>
> then ntpd will use recvfrom() to a stack-based buffer (0xefffcc74
> here) and discard the data so read.  My hunch is ntpd is somehow
> getting wedged during your cron jobs so that all receive buffers are
> consumed and more can not be allocated.  You can monitor the situation
> using ntpq -c iostats on 4.2.7, or ntpdc -c iostats on earlier
> versions.  Pay attention to free receive buffers and dropped packets
> (due to no buffer) in particular.
>
> ntpd can't allocate more receive buffers safely when handling SIGIO.
> That is done later, after the signal handler has returned, as a side
> effect of pulling a "full" receive buffer from a list of previously
> received packets for processing, if a packet had been dropped
> previously due to lack of receive buffers.  To debug if you've found a
> corner case where that allocation code never gets called, i suggest
> you try changing this code in libntp/recvbuff.c from:
>
> isc_boolean_t has_full_recv_buffer(void)
> {
> 	if (HEAD_FIFO(full_recv_fifo) != NULL)
> 		return (ISC_TRUE);
> 	else
> 		return (ISC_FALSE);
> }
>
> to
>
> isc_boolean_t has_full_recv_buffer(void)
> {
> 	if (HEAD_FIFO(full_recv_fifo) != NULL)
> 		return (ISC_TRUE);
> 	else {
> 		/* allocate more buffers if needed as side effect in
> get_full_recv_buffer() (which will return NULL) */
> 		get_full_recv_buffer();
> 		return (ISC_FALSE);
> 	}
> }

I will make these changes later today but just for more reference, 
here's an output of ntpdc -c iostats within a couple seconds of ntpd 
locking up.  The data was collected by a second machine polling the ntpd 
server every two seconds and recording the output.  The ntpd server was 
also having its peer list polled by a second machine (same one as the 
ntpdc) every ten seconds.

time since reset:     332
receive buffers:      10
free receive buffers: 9
used receive buffers: 0
low water refills:    1
dropped packets:      0
ignored packets:      2
received packets:     705
packets sent:         1159
packets not sent:     0
interrupts handled:   706
received by int:      704


It didn't take long to crash but I'm not able to catch the buffers in 
the act of filling up.  However, the increased poll rate does shorten 
the life of ntpd's run time.  Note this instance crashed after only 332 
seconds.  I was running ntpdc at a slower polling rate (every 5 seconds) 
and it took about 17,000 seconds to crash in that case.


More information about the questions mailing list