[ntp:hackers] Need help in old ntp_timer.c regarding peer->action functions

Brian Utterback brian.utterback at oracle.com
Thu Sep 5 13:20:28 UTC 2013


The function assignments are made in the refclock driver.

What version of NTP are you using? The code you posted is before version 
1.1 of ntp_timer.c in bitkeeper which is over 15 years old. (Have we 
really been using bitkeeper for 15 years? I feel old.)

I note that there was a change in the for loop to remove the 
intermediate "action" variable. I don't think that the action function 
itself can release the peer and get what you are seeing, because the 
condition on the for loop is that peer is not equal to zero, yet in the 
very first instruction after the test, peer *is* equal to zero. This 
implies that it changed between the test and this instruction. Since 
older NTP versions are single-threaded, the only way this could happen 
is in a signal handler. This suggests that the peer structure can be 
de-allocated in async context, which I think that the comment implies. 
This suggests to me that you should instrument the unpeer function.

Shyamkant, perhaps you can contact me directly since we both work for 
Oracle and we can see what we can figure out.


On 09/04/13 16:53, Shyamkant Bhavsar wrote:
> Hi,
>
> My name is Shyam, and I work as a Legacy Sustaining engineer for a 
> legacy product, where we used early version of NTP code say around ~ 
> 2003.
> I  do not have NTP domain knowledge or much familiarity with NTP code 
> base.
>
> I am hoping to connect with someone who is deeply familiar with 'peer' 
> data structures and 'action' functions inside timer() function
> of 'ntp_timer.c' file from year 2003 or so. Hope it picks intrigue to 
> someone of you folks who choose to please read on.
>
> PROBLEM:
> --------------
> There is a SEGV fault occurring randomly in the old NTP code base in 
> ntp_timer.c file, in timer() function where 'peer' structure pointer is
> accidentally becoming NULL. The code comment states that this may 
> happen from an earlier call to peer->action that occurs in a for() loop
> In order to trace which 'action' function got called; I want to put 
> debug prints in each of the potential action functions' entry points. 
> I need help
> to find out where these functions are located in the code. I did some 
> digging and below is a summary of what I could and could not do.
> Please read.
>
> My Code Search & Background:
> -------------------------------------------
> Browsing the code I am not able to figure out, where the peer->action 
> functions are; also I am not able to find out which part of NTP code
> actually assigns the function pointers for said timeouts. They seem to 
> depend on AM_MODES defined in ntp/ntpd/ntp_peer.c but I could not
> locate where the actual assignments are being done. I researched 
> init_peer() and findpeer() but to no avail. That AM Modes matrix is 
> 2-dimensional
> array with 54 possible matching associations. From a very little 
> glimpse into the code, (I did not read deeply into any RFC's or 
> Architecture docs
> of NTP). Based on an initial brainstorm with my manager (Sandip, Cc:ed 
> here), it seems that a random network synchronization failure might be
> causing an issue. That in turn may be causing problem in NTP packet 
> association mismatch among peers in our system. This might be 
> resulting in a
> certain 'action' function being called and executed in a separate NTP 
> thread, in which the memory for said 'peer' structure got freed (?) 
> And therefore
> in the next iteration within the timer() code's thread, there is SEGV 
> resulting from RH assignment in 'peer->action' code line highlighted 
> in red below.
>
>
> Snippet: timer() from ntp_timer.c
>
> ....
> ....
> ....
>         /*
>          * Now dispatch any peers whose event timer has expired. /_Be 
> careful__
> __         * here, since the peer structure might go away_ /as the 
> result of
>          * the call.
>          */
>         for (n = 0; n < HASH_SIZE; n++) {
>                 for (peer = peer_hash[n]; peer != 0; peer = next_peer) {
> *            action    = peer->action;* *// <--- During the current 
> iteration**, SEGV happens here....*
>                         next_peer = peer->next;
>                         if (action && peer->nextaction <= current_time)
> *                        action(peer);* *// <--- The call in prior 
> iteration suspected of freeing the **'peer' memory*....
> ....
> ....
> ....
>
> Any help will be greatly appreciated, and thanks again in advance for 
> your precious time.
>
> I am looking for :
>
> Q. 1: Where are function assignments being made  to 'peer->action()' 
> functions.
> Q. 2: Where to put debug instrumentation to track down the function 
> corresponding to last 'peer->action()'  and which NTP thread executed 
> it ?
> I am asking this because the suspicion is that, the other thread may 
> have freed or corrupted 'peer' data structure's memory triggering the 
> said SEGV.
>
> Q. 3: Where to locate similar issues reported on old NTP, where 
> 'peer->action' results in SEGV - We are seeing more than a few 
> instances of this.
> Q .4:Any other debug ideas ?
>
> Again, sorry for so many Q's on this mailing list.
> Thanks very much to all you NTP hackers reading this email; perhaps 
> someone can help shed light or point me in the right direction.
>
> Cheers & Warm Regards,
> Shyam
>
> With Warm Regards,
> Shyamkant R Bhavsar
> Software Engineer, Sustaining Products; Oracle, USA.
> Phone 408.276.0614
> Cell: 650.229.4795
>
> _______________________________________________
> hackers mailing list
> hackers at lists.ntp.org
> http://lists.ntp.org/listinfo/hackers


-- 
blu

Always code as if the guy who ends up maintaining your code will be a
violent psychopath who knows where you live. - Martin Golding
-----------------------------------------------------------------------|
Brian Utterback - Solaris RPE, Oracle Corporation.
Ph:603-262-3916, Em:brian.utterback at oracle.com



More information about the hackers mailing list