[ntp:questions] ntpdate.c unsafe buffer write

Serge Bets serge.bets at NOSPAM.laposte.invalid
Mon Feb 11 17:47:59 UTC 2008


Hello Harlan,

 On Monday, February 11, 2008 at 0:33:36 +0000, Harlan Stenn wrote:

> 1) what are you trying to accomplish by the sequence:
>
>  ntpd -gq ; wait a bit; ntpd
>
> that you do not get with:
>
>  ntpd -g ; ntp-wait

Let's compare. I used a some weeks old ntp-dev 4.2.5p95, because the
latest p113 seems to behave strangely (clearing STA_UNSYNC long before
the clock is really synced). The driftfile exists and has a correct
value. ntp.conf declares one reachable LAN server with iburst. There are
4 main cases: initial phase offset bigger than 128 ms, or below, and
your startup method, or my method.

 -1) Initial phase offset over 128 ms, ntp-wait method:

| 0:00 # ntpd -g; ntp-wait; time_critical_apps
| 0:07 time step ==> the clock is very near 0 offset (less than a ms),
|      stratum 16, refid .STEP., state 4
| 0:12 ntp-wait terminates ==> time critical apps can be started
| 1:20 *synchronized, stratum x ==> ntpd starts serving good time

Timings are in minutes:seconds, relative to startup. Note this last
*sync stage, when ntpd takes a non-16 stratum, comes at a seemingly
random moment, sometimes as early as 0:40.


 -2) Initial phase offset over 128 ms, my slew_sleeping script:

| 0:00 # ntpd -gq | slew_sleeping; ntpd
| 0:07 time step, no sleep ==> near 0 offset (time critical apps can be
|      started)
| 0:14 *synchronized ==> ntpd starts serving good time


 -3) Initial phase offset below 128 ms, ntp-wait method (worst case):

| 0:00 # ntpd -g; ntp-wait; time_critical_apps
| 0:07 *synchronized ==> ntpd starts serving time, a still "bad" time,
|      because the 128 ms offset is not yet slewed
| 0:12 ntp-wait terminates ==> time critical apps are started
| 7:30 offset crosses the zero line for the first time, and begins an
|      excursion on the other side (up to maybe 40 ms). The initial good
|      frequency has been modified to slew the phase offset, and is now
|      wildly bad (by perhaps 50 or 70 ppm). The chaos begins, and will
|      stabilize some hours later.


 -4) Initial phase offset below 128 ms, slew_sleeping script:

| 0:00 ntpd -gq | slew_sleeping; ntpd
| 0:07 begin max rate slew, sleeping all the necessary time (max 256
|      seconds)
| 4:23 wake-up ==> near 0 offset, time critical apps can be started
| 4:30 *synchronized ==> ntpd starts serving good time


Summary: The ntp-wait method is good at protecting apps against steps,
but not against "large" offsets (tens or a hundred of ms). The daemon
itself can start serving such less-than-good time. Startup takes more
time to reach a near 0 offset, and can wreck the frequency.

The ntpd -gq method does also avoid steps to applications, if all works
well. But it's not a 100% protection, not the goal. It also protects
apps against large offsets, never serves bad time, and never squashes
the driftfile. It makes a much saner daemon startup, more stable, where
the "chaos" situation described above (case #3) doesn't happen. It
startups faster, outside of the cases where ntp-wait cheats by
tolerating not yet good offsets.


If necessary, slew_sleeping and ntp-wait can be combined, for a better
level of protection. What about the following, that should survive even
a server temporarily unavailable during startup, further delaying time
critical apps:

| # ntpd -gq | slew_sleeping; ntpd -g; ntp-wait; time_critical_apps

One could also imagine looping ntpd -gq until it works, then sleep, then
ntpd and time_critical_apps (the slew_sleeping scripts has to be
modified to return success code):

| # while ntpd -gq | slew_sleeping; do :; done; ntpd; time_critical_apps


Serge.
-- 
Serge point Bets arobase laposte point net




More information about the questions mailing list