[ntp:questions] Re: OS recomendations for stratum 2 clocks

Joseph Gwinn JoeGwinn at comcast.net
Fri Sep 16 23:03:56 UTC 2005


In article <5M-dnfDyh5qKubreRVn-rA at comcast.com>,
 "Richard B. Gilbert" <rgilbert88 at comcast.net> wrote:

> Joseph Gwinn wrote:
> 
> >In article <8c-dnZ2dnZ39fdy0nZ2dnb_puN6dnZ2dRVn-yZ2dnZ0 at comcast.com>,
> > "Richard B. Gilbert" <rgilbert88 at comcast.net> wrote:
> >
> >  
> >
> >>Joseph Gwinn wrote:
> >>
> >>    
> >>
> >>>In article <p06200715bf49e0aff7e0@[10.0.1.210]>,
> >>>brad at stop.mail-abuse.org (Brad Knowles) wrote:
> >>>      
> >>>
> >[snip]
> >  
> >
> >>>Probability of necessity varies with application. 
> >>>
> >>>Right now I have a problem with a closed network where the computer 
> >>>clocks sometimes get ten or twenty milliseconds out of synch, even 
> >>>though they usually stay within a millisecond or so.  The LANs are very 
> >>>lightly loaded, and the whole system would fit into a sphere 35 meters 
> >>>in diameter, so transport delay isn't the issue.
> >>>
> >>>The problem is that other realtime activities (application code) in the 
> >>>various servers is kicking the NTP daemons sidewise during heavy system 
> >>>load.   The daemons are at default priority.  NTP cannot tell this from 
> >>>real transport delay, randomly asymmetrical delay at that, so a lot of 
> >>>really bad samples eventually leak through the median filter and corrupt 
> >>>NTP's notion of the time offset to the master clocks.   NTP is actually 
> >>>fairly resistant to this kind of abuse, but the application code is 
> >>>sufficiently overloaded that the necessary abuse is often arranged.
> >>>
> >>>The immediate solution will have to be to promote the daemons to higher 
> >>>realtime priority than that of those interfering other activities, but 
> >>>the people responsible for those activities are likely to object (more 
> >>>      
> >>>
> >>>from fear than from thought, but ... the pressure is on).  Or, just live 
> >>    
> >>
> >>>with it.
> >>>
> >>>Joe Gwinn
> >>> 
> >>>
> >>>      
> >>>
> >>If these servers are running Windows, there's little hope!
> >>    
> >>
> >
> >True enough.  
> >
> >But no; they are running SGI IRIX on the servers where the sideways 
> >kicking happens.  No Windows in this drama.
> >
> >
> >  
> >
> >>If they are running some flavor of Linux and the clock tick rate is set 
> >>to 1000 Hz, it can be changed to 100 Hz and the kernel rebuilt.   This 
> >>cuts the opportunity to lose interrupts by a factor of ten.
> >>    
> >>
> >
> >Which helps, but isn't close to a solution.  There should be *no* lost 
> >timer interrupts.
> >
> >
> >Joe Gwinn
> >  
> >
> Then I think you need to talk to Silicon Graphics about it.  If it's a 
> bug they may be able to patch it.  If, as seems likely, it's an O/S 
> design issue, the fix may require a lot of time and resources.

The lost interrupts were in Linux, not IRIX (which appears to have a 
1024-Hz RT timer interrupt).  I was reacting to the proposal that one 
drop the timer interrupt rate in Linux: while this will certainly reduce 
the number of lost interrupts, the root problem remains uncorrected.

Joe Gwinn

Joe




More information about the questions mailing list