[ntp:hackers] Daughter of RFC-2030

David L. Mills mills at udel.edu
Mon Sep 29 08:37:33 PDT 2003


Mark,

I'll do what I can, bearing in mind the document has already been
reformatted by the RFC Editor and published as
http://www.ietf.org/internet-drafts/draft-mills-sntp-v4-00.txt.

I really appreciate the time and effort you put in on this. The issues
you raise can be dealt with during the normal ID review cycle. I'm not
sure who/what the Editor has in mind as responsible for review; I know
of no task force assigned. Maybe it's just these mailing lists.

Dave

Mark Martinec wrote:
> 
> Dave,
> 
> | I did an extensive review and markup of the proposed RFC-2030 update,
> | mainly to crispy-fry the prose and suppress ambiguity. See the PDF at
> | www.eecis.udel.edu/~mills/reports.html or the NTP project page related
> | publications section.
> 
> I finally managed to transcribe my sidenotes from the printed copy
> of son-of- RFC 2030. Sorry for the delay. So here they are.
> 
> -----------
> p.3
> | configurations where no NTP or SNTP client is dependent on
> | another SNTP client for synchronization.
> 
> The meaning of this statement escapes me. A (S)NTP client never
> depends on another (S)NTP _client_ for synchronization.
> It depends on servers, or, some applications running this host
> may depend on having the machine time synchronized with other hosts.
> 

This is a vitally important consideration. Say an upstream SNTP server
which adjusts its clock in one-second or even ten-millisecond increments
has a downstream NTP or SNTP client with conventional clock discipline
loop. The client would bump and grind something like a pinball machine.
Even if the client was another SNTP implementation with correspondingly
course granulatity, the pinball amplitude would get larger and more
unstable as the stratum increases.

The restriction that a SNTP server be stratum 1 is pragmatic and based
on several current products that include a GPS receiver and generally
good clock discipline. What I really don't want to see is some j-random
knockoff SNTP implementation pinballing like crazy and causing general
chaos with a large community of pinball clients. In general, the subnet
could become unstable and result in vast spasms of hate mail to the news
wire.

> -----------
> p.5
> | It is advisable to fill the non-significant low order bits of the timestamp
> | with a random, unbiased bitstring, both to avoid systematic roundoff errors
> | and as a means of loop detection and replay detection (see below).
> 
> ==>
>   and as a means of loop detection, replay detection, and rudimentary
>   protection against spoofed server replies from malicious third party
>   (see below).
>
This point is made later in the prose. There is some wishwash in the
model, which is intended to cover cases like vanilla UDP/TIME, where the
pseudo-nonce is not used, and to cases where it is. In other words, the
intent is not to require the test, but to highly encourage it.
 
> -----------
> p.9, figure 2
> | Code External Reference Source
> | ------------------------------------------------------------------------
> | -
> | LOCL uncalibrated system clock
> 
> looks like a wrapped '-'
>

I'll have to check this in proof.
 
> -----------
> p.9, figure 2
> | CESM calibrated Cesium clock
> | RBDM calibrated Rubidium clock
> 
> Why invent new labels, when Cesium and Rubidium have their
> established chemical symbols:
> 
>   CS   calibrated Cesium clock
>   RB   calibrated Rubidium clock
>

Frankly, I'm victim of anal retentive neat and four-character labels.
I'll consider in proof.
 
> -----------
> p.9, figure 2
> 
> A general comment: the semantics of this table is poorely defined
> 
> - does it list a frequency generation method
>     (Cs, Rb, quartz, hydrogen, quazar, ...)
> - does it list transmission medium/encoding (IRIG, modem,
>     radio (loran, GPS, DCFp, ...))
> - does it list radio call signs (DCF or DCF77)
> - does it list national metrology laboratory name (PTB, USNO)
> 
> Take for example a NTP server at PTB laboratory. What should it announce
> as its reference source? PPS, CS, H, PTB, DCF, IRIG ???
> 
> The list of radio clocks is probably hard to keep up to date.
> There are several radio clock sources missing, listed e.g. in:
>   http://www.npl.co.uk/time/time_trans.html
>   http://www.cl.cam.ac.uk/~mgk25/lf-clocks.html
> Why list some, and not the others?
> 
> Omega is long gone (terminated in 1997).
> 
> Why was GOES removed from the list?
> 
> Perhaps the way out of this ambiguity would be to provide
> some guidelines on how to select a new label if none of
> the listed ones are adequate.
> 
> How about IANA providing a clearing house for registering new names?
>

My only mission in the table is to attempt some sort of common taxonomy
so folks have some idea what the source is. While the table may have
some dust (LORAN comes to mind, even though there are at least two NTP
implementations on planet that do in fact synchronize to LORAN), I'm not
sure it deserves advice and consent from the Numbers Czar. The OMEGA and
GOES labels should be fixed.

> -----------
> p.9
> | Reference Timestamp: This field is significant only in server messages,
> | where the value is the time at which the system clock was last set or
> | corrected, in 64-bit timestamp format.
>   ^^^^^^^^^
> 
> What does 'corrected' really mean? Last time the clock was 'stepped'?
> Last time a frequency correction was recalculated?
> Last time adjtime was called? Time of last PPS pulse?
>

Corrected is a good word and is intended to cover both incremental
adjustments (adjtime()) and steps (settimeofday()). The intent, perhaps
not precisely articulated, is the time when a correction is computed;
i.e., clock update computed from timestamps.
 
> -----------
> p.10/p.11
> | the request is set to the time of day according to the client clock in NTP
> | timestamp format.
> 
> ... with non-significant bits randomized.
> 

I'll buy that. Gotta be careful here; it is not required by spec to
randomize the bits.

This brings up the horny issues of implementation advice and spec
requirement, and the ID reeks of ambiguities in both. Compare the TCP
RFC of a few dozen pages and the SDSC "TCP formal standard" of bookish
proportions. I was a consultant on the SDSC project and was alternately
delighted and horrified by the strict rigor of boring prose. Has anybody
ever read or even know about the existence of the SDSC document?
Obviously, I believe the best communication with wackish crowds like
Microsoft and Netgear engineers should be amply stocked with useful,
even if ambiguous, advice.

> -----------
> p.11
> | Note that in general both delay and offset are signed quantities and can in
> | general be less than zero; however, a delay less than zero is possible only
> | in symmetric modes,                 =======================================
>   ==================
> 
> This is not true, as I have explained long time ago.
> 
> Obtaining negative delay is routinely observed in case of a fast
> client with fine clock resolution, and a server on the same LAN
> with coarse clock resolution (e.g. a 10 ms clock).
> This happens when a query/response occurs around server clock increment.
> 
> For example:
> - T2 and T3 are separated by a server clock tick (10 ms),
>   but the actual time interval may be very small.  T3-T2 = 10 ms
> - the whole transaction (query/response) on a LAN may take 1 ms: T4-T1 = 1 ms
> - delay calculated is (T4-T1)-(T3-T2) = -9 ms
> 
> The same thing on a less pronounced scale happens with 1 ms clock resolutions
> and sub-millisecond trancastions on LAN.
> 
> As a sidetrack: it is interesting to know that in situations just described,
> the good estimate of the true delay can only be calculated
> by taking the average of delays, including the negative ones.
>

The Awful Truth is exposed in the technical report, book chapter and NTP
project page. It has to do with the uncertainty of measurement, called
(Greek) rho in those documents. A full revelation is in the documents
and, to put it mildly, way too complex for ordinary folks likely to read
the ID. The example even simpler than yours is on a fast net when the
server does not take a tick and the client does.

The correct answer, as developed in the documents, depends on what you
need the roundrip measurement for. If an upper bound, the right way is
to compute an upper bound on the measurement where the server adds one
tick to T3 and the client subtracts one tick from T4 bounded on zero.
Note the tick values for server and client may not be the same.

Without drowning in these issues, the wording should be adjusted.
 
> -----------
> p.12
> | While not required in a conforming SNTP client implementation, it is wise
> | to consider a suite of sanity checks designed to avoid various kinds of
> | 1..5
> 
> - ignore packets with broadcast destination IP address and mode not 5
>   (even better: check for ethernet broadcast packets, and ignore them)
> 
> - ignore KoD if destination IP address is a broadcast address
> 

My expectation is that a SNTP broadcast client doesn't send anything,
not even KoD, but you have a point.

> -----------
> p.13
> | Note that SNTP servers normally operate as primary (stratum 1) servers.
> | While operating at higher strata (up to 15) and at the same time
> | synchronizing to an external source such as a GPS receiver is not
> | forbidden, this is strongly discouraged.
> 
> The last two lines here are hard to understand
> (what is not forbidden and what is then discouraged?). Please rephrase.
>

Roger that.

 
> -----------
> p.14
> | If the server is synchronized, the Reference Timestamp is set to the time
>                                                                    ^^^^^^^^
> (which time?) (local) system time
> 
> | the last update was received from the reference source. The Originate
> | Timestamp field is set as in the unsynchronized case above. The Transmit
> | Timestamp field are set to the time of day when the message is sent. In
>                                  ^^^^^^^^^^^
> (local) system time?
> 
> The term 'time of day' is never defined. Defining and using a term
> to refer to the local system's idea of the current time would be useful.
> 
> | broadcast messages the Receive Timestamp field is set to zero and copied
>                                                                 ^^^^^^^^^^
> | from the Transmit Timestamp field in other messages.
>   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> 
> Please rephrase / split in two sentences.
> It doesn't make sense (on first reading) to be both zeroed and copied.
> 
> | from the Transmit Timestamp field in other messages.
>                               ^^^^^^^^^^^^^^^^^^^^^^^
> 
> ...field of the client request in other messages
>

The SNTP local clock runs in UTC, so there should be no ambiguity in
these statements. The terms "local system time" and "external reference
time" would be better.
 
> -----------
> p.14
> | Initial setup for SNTP servers and clients can be done using a web client,
> | if available, or a serial port if not.
> 
> It may be worth mentioning DHCP/bootp request.
> Btw, is there a standard/recommended/any-at-all bootp field
> suitable for the purpose of obtaining a list of NTP servers?
>

Absolutely, the DHCP/bootp issue should be explored. This issue came up
after the document was submitted to the RFC Editor. There should be
opportunities in the ID review cycle to put this in.
 
> -----------
> p.15 and elsewhere:
> 
> | address as well. The configuration data for cryptographic authentication is
> | beyond the scope of this memo.
>                       ^^^^^^^^^
> 
> memo -> document  (throughout)
>

Nope; this is apparently a Postel tradition.
 
> -----------
> p.15
> | selection for SNTP clients with no pre-specified server configuration. For
> | instance a role server with CNAME such as pool.ntp.org returns a randomized
> 
> Is pool.ntp.org really intended for direct use by end-user SNTP clients
> on each PC/desktop? If this is so, I guess most participants will bail out
> of the pool. What about rules of engagement, NTP hierarchy, etc.
> 
> | instance a role server with CNAME such as pool.ntp.org returns a randomized
>                               ^^^^^^^^^^^^^^^^^^^^^^^^^^
> 
> Also, pool.ntp.org is not CNAME. It is a set of genuine A records.
>

What can I do? The public lists are widely abused, and I mean really
awfully abused. It may well happen, assuming somebody implements a fair
DNS resolver (not like Netgear!), resolving poll.ntp.org may be just as
good/bad as any other name. The ntp.pool.org things should be clarified
in a footnote. As you may note, footnotes are set apart by indenting.

> -----------
> p.15
> | protocol, the client binds to the server from which the first reply is
> | received and continues operation in unicast mode.       ^^^^^^^^^^^^^^
> 
> (but only if the server reply matches the sanity checks against
> the request packet)
>

As I said above, I don't want to prescribe as necessary the matching
rule. I did want to leave the impression that the UDP/TIME model is
perfectly acceptable.
 
> -----------
> p.15
> | control. The conclusion is that clients must respect the means available to
> | targeted servers to stop them from sending packets.
>                            ^^^^
> stop whom: servers or clients? Rephrase.
> 
> -----------
> p.16, Figure 3
> | DENY Access denied by remote server
> | RSTR Access denied due to local policy
> | NKEY No key found. Either the key was never installed or is not trusted
> | RMOT Somebody is tinkering with the association from a remote host
> |      running ntpdc. Not to worry unless some rascal has stolen you keys
> 
> It is not clear from whose point of view terms 'local' and 'remote'
> are considered.
> 
> E.g. a server sends RSTR - does it mean the server has a local policy
> against this client, or the policy is local to the client and the RSTR
> is not found in an actual packet, but is a pseudo status invented by client?
> 
> Suggestion:
>   separate codes which indicate permanent condition unlikely to change,
>   from the temporary ones, which may be worth retrying after some backoff
>   time. Along the lines of 4xx and 5xx codes in SMTP, where a client need not
>   understand exactly what the problem is, but can still behave reasonably
>   based on the general 'class' of error.
> 
>   Perhaps devoting one of four characters for the purpose, or perhaps having
>   permanent errors in all-capitals, and temporary ones in all-lowercase.
> 
>   INIT, RATE, STEP (and DROP?) are probably examples of temporary failures.
>

These condition codes were not intended to be cast in concrete and, in
fact, where shamelessly stolen from the current NTP implementation.
However, the intent is that these codes appear only in server replies
and that should be made clear, as should the temporary/transient
interpretation. 

> -----------
> p.16
> | few years, mirroring the growth of the Internet. Just about every Internet
> | appliance has some kind of NTP support, including Windows XP, Cisco
>                                                     ^^^^^^^^^^^ ^^^^^
> It is likely this RFC will be useful/valid in 10 years time,
> whereas Windows XP will be a memory of the past, probably looking
> ridiculous on reading this document; and Cisco may no longer exist.
> It is best to avoid mentioning names unnecessarily.
>

True in ten years, although I doubt Windows will go away anytime soon.
The statement applies to recent history and currnt times and is on that
basis accurate.

> -----------
> p.17
> | and private NTP and SNTP servers. Recent experience strongly suggests that
> | device designers pay particular attention to minimizing resource impacts,
>                   ^MUST
> 
> | 2. A client SHOULD increase the poll interval using exponential backoff as
> |    performance permits and especially if the server does not respond within
> |    a reasonable time.                                        ^^^^^^^^^^^^^^
> 
> ...respond with a valid reply within...

The algorithm at the end of the document should be cited as reference
here.

> 
> | 4. A client MUST allow the operator to configure the primary and/or
> |    alternate server names or addresses in addition to or in place of a
> |    firmware default IP address.
> 
> It should be able to disable/replace the built-in address.
> It is not good enough to be able to provide an additional address!

Upon review, I think the statement covers that case. The expectation is
that you can't change a firmware address, just enable/disable "in place
of" it.

> 
> | 5. If a firmware default server IP address is provided, it MUST be a server
> |    operated by the manufacturer or seller of the device or another server,
> |    but only with the operator's permission.
>                    ^^^^^^^^^^^^
> ...the server operator's...

Well, there is some room here, but the statement clearly implies the
operator in question is the server.

> 
> | 7. A client SHOULD re-resolve the server IP address on a periodic intervals,
>                                                         ^^^                 ^
> singular/plural

Roger. I expected you to say "how often?"

> 
> | 8. A client SHOULD support the NTP access-refusal mechanism, so that a
> |    server kiss-o'-death reply in response to a client request causes the
> |    client to cease sending requests to that server and to switch to an
> |    alternate, if available.
>                             ^^^
> , or back off exponentially.
> 
Roger; I think this point was made previously, but probably should be
more explicit.

> -----------
> p.18
> | If the firmware or documentation includes specific server names, the names
> | should be those the manufacturer or seller operates as a customer
> | convenience or those for which specific permission has been obtained from
> | the operator.
>               ^^^
> 
> , or else the DNS names and IP address ranges provided for the
> illustration purposes (RFC 2606, rfc 3330 (IANA TEST-NET))
> 
> | has been obtained from the operator.
>                          ^^^^^^^^^^^^
> ...the service operator.

We'll see what happens on ID review.

> 
> | the operator. A DNS request for a generic server name such as
> | ntp.mytimeserver.com results should result in a random selection of server
>   ^^^^^^^^^^^^^^^^^^^^
> 
> Don't invent names, stick to RFC 2606, e.g. ntp.mytimeserver.example
> 
> | received, a new randomized list is returned. The client ordinarily uses the
> | first address on the list.
>                            ^^^
> ... or randomly selects one from the list.
> 
> | 2. When first coming up or after reset, randomize the timeout from one to
> |    five minutes. ...                    ^^^^^^^^^
> 
> randomize:
> - randomizing in units of the same make should produce different results;
> - the randomized timeout should be from a reasonably continuous interval,
>   not large discrete steps like 1, 2, 3, 4 or 5 minutes exactly.

It would of course be wise to avoid exactly synchronous values, but I
didn't intend to put such a fine point on it. By "exponential backoff" I
assume a lazy implementor will read the system clock, compute the
residue mod four minutes and add one minute as the initial value, then
double it for every missed poll. There may be many alternative
possibilities.

> 
> A general comment on the algorithm suggested:
> I think it:
> - does not solve the situation of having a huge number of clients
>   scattered over the net;
> - does not solve well the case of reasonable number of clients
>   in a controlled (company) environment, requiring fast aquisition
>   of time after reboot from local NTP servers on LAN
> 
> It might be better to just describe the common pitfalls to be avoided,
> and some DOs and DON'Ts guidelines.
> 

Geeze, I thought the algorithm does mitigate the issues faced in the
UWisc incident, which it is expressly designed to do. It does not
"solve" the acquisition latency issue when thousands of clients come
back up after a widespread power failure. Your PC may have to wait a
minute or two before setting the clock, and that's just the way life
sucks.

> -----------
> p.17
> | Security issues are not discussed in this memo.
> 
> memo -> document
>
Op cit.
 
> Regards
>    Mark
> 
> --
>   !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
>   !!  Mark Martinec (system manager)     tel  +386 1 4773-575 !!
>   !!  J. Stefan Institute, Jamova 39     fax  +386 1 2519-385 !!
>   !!  SI-1000 Ljubljana, Slovenia        mark.martinec at ijs.si !!
>   !!!!!!!!!!!!!!!!!!!!!!!!!! http://www.ijs.si/people/mark/ !!!!



More information about the hackers mailing list