[ntp:questions] Seeing large jitter and delay in recent ntpd and recent Solaris
cswiger at mac.com
Tue Nov 8 23:26:20 UTC 2011
On Nov 8, 2011, at 12:29 PM, Fran wrote:
> My ‘high performance’ configuration is running ntpd bound to processor 0 on an SMP, and at highest priority. My poll rate is 16 sec. And I’m running ntpq locally so ntpq is communicating with ntpd over the loopback interface. Most of these conditions are required to create the problem. So running time share, or running ntpq remotely, or not binding to processor 0 and then there is no problem. I haven’t tried other poll rates but its not likely that matters.
I'm not as familiar with the SPARC T3s as I was with SPARCv8 and v9 processors, but just as a generalization, processor 0 is generally the system bootstrap processor, which starts up the system far enough to bootstrap the other processors. It also tends to have a more special role handling interrupts than the other processors, so if you want to use CPU affinity, choose any other processor but 0.
> A SIGPOLL interrupt happens soon after as expected, as the return packet comes in. What then catches my eye is the select() call in the signal handler returns with return value and file descriptors set to zero. These should be nonzero. This is why I think there is a problem in Solaris. So with these zero return values from select(), the signal handler does not read out the packet from socket, and returns. The packet stays in the socket and gets stale.
That does sound like a problem with select(), however:
If you've set ntpd's priority above 100 via realtime scheduling class, then it has priority over the system kernel threads which service network interrupts. select() might be legitimately returning zero because ntpd is running before the NTP packet gets processed by the network stack.
If you continue to run ntpd bound to processor 0, but change priority to 59 fixed-priority, does it see the packets then?
More information about the questions