[ntp:questions] ntpdate.c unsafe buffer write
Serge Bets
serge.bets at NOSPAM.laposte.invalid
Mon Feb 11 17:47:59 UTC 2008
Hello Harlan,
On Monday, February 11, 2008 at 0:33:36 +0000, Harlan Stenn wrote:
> 1) what are you trying to accomplish by the sequence:
>
> ntpd -gq ; wait a bit; ntpd
>
> that you do not get with:
>
> ntpd -g ; ntp-wait
Let's compare. I used a some weeks old ntp-dev 4.2.5p95, because the
latest p113 seems to behave strangely (clearing STA_UNSYNC long before
the clock is really synced). The driftfile exists and has a correct
value. ntp.conf declares one reachable LAN server with iburst. There are
4 main cases: initial phase offset bigger than 128 ms, or below, and
your startup method, or my method.
-1) Initial phase offset over 128 ms, ntp-wait method:
| 0:00 # ntpd -g; ntp-wait; time_critical_apps
| 0:07 time step ==> the clock is very near 0 offset (less than a ms),
| stratum 16, refid .STEP., state 4
| 0:12 ntp-wait terminates ==> time critical apps can be started
| 1:20 *synchronized, stratum x ==> ntpd starts serving good time
Timings are in minutes:seconds, relative to startup. Note this last
*sync stage, when ntpd takes a non-16 stratum, comes at a seemingly
random moment, sometimes as early as 0:40.
-2) Initial phase offset over 128 ms, my slew_sleeping script:
| 0:00 # ntpd -gq | slew_sleeping; ntpd
| 0:07 time step, no sleep ==> near 0 offset (time critical apps can be
| started)
| 0:14 *synchronized ==> ntpd starts serving good time
-3) Initial phase offset below 128 ms, ntp-wait method (worst case):
| 0:00 # ntpd -g; ntp-wait; time_critical_apps
| 0:07 *synchronized ==> ntpd starts serving time, a still "bad" time,
| because the 128 ms offset is not yet slewed
| 0:12 ntp-wait terminates ==> time critical apps are started
| 7:30 offset crosses the zero line for the first time, and begins an
| excursion on the other side (up to maybe 40 ms). The initial good
| frequency has been modified to slew the phase offset, and is now
| wildly bad (by perhaps 50 or 70 ppm). The chaos begins, and will
| stabilize some hours later.
-4) Initial phase offset below 128 ms, slew_sleeping script:
| 0:00 ntpd -gq | slew_sleeping; ntpd
| 0:07 begin max rate slew, sleeping all the necessary time (max 256
| seconds)
| 4:23 wake-up ==> near 0 offset, time critical apps can be started
| 4:30 *synchronized ==> ntpd starts serving good time
Summary: The ntp-wait method is good at protecting apps against steps,
but not against "large" offsets (tens or a hundred of ms). The daemon
itself can start serving such less-than-good time. Startup takes more
time to reach a near 0 offset, and can wreck the frequency.
The ntpd -gq method does also avoid steps to applications, if all works
well. But it's not a 100% protection, not the goal. It also protects
apps against large offsets, never serves bad time, and never squashes
the driftfile. It makes a much saner daemon startup, more stable, where
the "chaos" situation described above (case #3) doesn't happen. It
startups faster, outside of the cases where ntp-wait cheats by
tolerating not yet good offsets.
If necessary, slew_sleeping and ntp-wait can be combined, for a better
level of protection. What about the following, that should survive even
a server temporarily unavailable during startup, further delaying time
critical apps:
| # ntpd -gq | slew_sleeping; ntpd -g; ntp-wait; time_critical_apps
One could also imagine looping ntpd -gq until it works, then sleep, then
ntpd and time_critical_apps (the slew_sleeping scripts has to be
modified to return success code):
| # while ntpd -gq | slew_sleeping; do :; done; ntpd; time_critical_apps
Serge.
--
Serge point Bets arobase laposte point net
More information about the questions
mailing list