[ntp:questions] ntpdate.c unsafe buffer write

David L. Mills mills at udel.edu
Tue Feb 12 03:03:37 UTC 2008


Serge,

The behavior after a step is deliberate. The iburst volley after a step 
  is delayed a random fraction of the poll interval to avoid implosion 
at a busy server. An additional delay may be enforced to avoid violating 
the headway restrictions. This is not to protect your applications; it 
is to protect the server.

Dave

Serge Bets wrote:

> Hello Harlan,
> 
>  On Monday, February 11, 2008 at 0:33:36 +0000, Harlan Stenn wrote:
> 
> 
>>1) what are you trying to accomplish by the sequence:
>>
>> ntpd -gq ; wait a bit; ntpd
>>
>>that you do not get with:
>>
>> ntpd -g ; ntp-wait
> 
> 
> Let's compare. I used a some weeks old ntp-dev 4.2.5p95, because the
> latest p113 seems to behave strangely (clearing STA_UNSYNC long before
> the clock is really synced). The driftfile exists and has a correct
> value. ntp.conf declares one reachable LAN server with iburst. There are
> 4 main cases: initial phase offset bigger than 128 ms, or below, and
> your startup method, or my method.
> 
>  -1) Initial phase offset over 128 ms, ntp-wait method:
> 
> | 0:00 # ntpd -g; ntp-wait; time_critical_apps
> | 0:07 time step ==> the clock is very near 0 offset (less than a ms),
> |      stratum 16, refid .STEP., state 4
> | 0:12 ntp-wait terminates ==> time critical apps can be started
> | 1:20 *synchronized, stratum x ==> ntpd starts serving good time
> 
> Timings are in minutes:seconds, relative to startup. Note this last
> *sync stage, when ntpd takes a non-16 stratum, comes at a seemingly
> random moment, sometimes as early as 0:40.
> 
> 
>  -2) Initial phase offset over 128 ms, my slew_sleeping script:
> 
> | 0:00 # ntpd -gq | slew_sleeping; ntpd
> | 0:07 time step, no sleep ==> near 0 offset (time critical apps can be
> |      started)
> | 0:14 *synchronized ==> ntpd starts serving good time
> 
> 
>  -3) Initial phase offset below 128 ms, ntp-wait method (worst case):
> 
> | 0:00 # ntpd -g; ntp-wait; time_critical_apps
> | 0:07 *synchronized ==> ntpd starts serving time, a still "bad" time,
> |      because the 128 ms offset is not yet slewed
> | 0:12 ntp-wait terminates ==> time critical apps are started
> | 7:30 offset crosses the zero line for the first time, and begins an
> |      excursion on the other side (up to maybe 40 ms). The initial good
> |      frequency has been modified to slew the phase offset, and is now
> |      wildly bad (by perhaps 50 or 70 ppm). The chaos begins, and will
> |      stabilize some hours later.
> 
> 
>  -4) Initial phase offset below 128 ms, slew_sleeping script:
> 
> | 0:00 ntpd -gq | slew_sleeping; ntpd
> | 0:07 begin max rate slew, sleeping all the necessary time (max 256
> |      seconds)
> | 4:23 wake-up ==> near 0 offset, time critical apps can be started
> | 4:30 *synchronized ==> ntpd starts serving good time
> 
> 
> Summary: The ntp-wait method is good at protecting apps against steps,
> but not against "large" offsets (tens or a hundred of ms). The daemon
> itself can start serving such less-than-good time. Startup takes more
> time to reach a near 0 offset, and can wreck the frequency.
> 
> The ntpd -gq method does also avoid steps to applications, if all works
> well. But it's not a 100% protection, not the goal. It also protects
> apps against large offsets, never serves bad time, and never squashes
> the driftfile. It makes a much saner daemon startup, more stable, where
> the "chaos" situation described above (case #3) doesn't happen. It
> startups faster, outside of the cases where ntp-wait cheats by
> tolerating not yet good offsets.
> 
> 
> If necessary, slew_sleeping and ntp-wait can be combined, for a better
> level of protection. What about the following, that should survive even
> a server temporarily unavailable during startup, further delaying time
> critical apps:
> 
> | # ntpd -gq | slew_sleeping; ntpd -g; ntp-wait; time_critical_apps
> 
> One could also imagine looping ntpd -gq until it works, then sleep, then
> ntpd and time_critical_apps (the slew_sleeping scripts has to be
> modified to return success code):
> 
> | # while ntpd -gq | slew_sleeping; do :; done; ntpd; time_critical_apps
> 
> 
> Serge.




More information about the questions mailing list