[ntpwg] NTPv4 SNMP MIB draft
David L. Mills
mills at udel.edu
Tue Jan 24 22:42:29 UTC 2006
Guys,
I've a little trouble here with nitpicks, which are inline below. I
would however like to take a closer look at what is going on here and
what is the expectation on how the MIB is used. There are two customers
of these data, human and some sort of AI program that can distil reports
and log events.
Think about it this way. The standardized monitoring programs and
statistics recording features have been found very useful over the
years. Let's say one of the goals of the MIB is to allow remote
reconstruction of the data produced by ntpq and recorded by the filegen
facility. So, a litmus test is to ask what MIB queries do I need to
produce a local copy of the loopstats, clockstats, cryptostats,
peerstats, sysstats and rawstats records? Can I reconstruct all or most
of the ntpq billboards from MIB queries?
Another litmus test is what kind of script could be written, possibly a
cron job and/or driven by traps, that could collect the filegen data and
look for spikes and extreme frequency wobbles. Use the rawstats data to
generate a wedge scattergram once per day. Experience here suggests that
the most useful summary statistic of them all.
I still have trouble with the data types. A REAL is I assume an IEEE
floating double; a floating single is not very useful. But, if a display
string is included, either a human reads it directly or an AI program
simply parses it, with due regard for precision and headroom. So, why do
we need the binary representation in the first place? If the binary
representation is needed for simple programs in a PIC, for example, then
what is the PIC going to do about the actual value, if no more precise
than the display string?
Dave
ntpSrvTimeResolution OBJECT-TYPE
SYNTAX DisplayString
MAX-ACCESS read-only
STATUS current
DESCRIPTION
"string describing the time resolution of the running NTP
implementation"
-- e.g. "100ns"
-- depends on the NTP implementation and the underlying OS. The
current resolution should be used, so
-- if the OS only suppoers 10ms and ntpd is capable of 1ns, the 10ms
should be advertised
::= { ntpSrvInfo 5 }
xxx You need to use these terms technically. Resolution is the number of
significant bits in a clock reading, which could be one nanosecond,
while precision is the minimum increment that can be distiguished, which
could be as much as a clock tick of 10 ms.
ntpSrvTimeResolutionVal OBJECT-TYPE
SYNTAX Integer32
MAX-ACCESS read-only
STATUS current
DESCRIPTION
"time resolution in integer format"
-- ntpSrvTimeResolution in Integer format
-- shows the resolution based on 1 second, e.g. "1ms" translates to 1000
::= { ntpSrvInfo 6 }
xxx I don't understand why you need this, as the exact value can be
computed from the string. Why should the managed object do this, which
might be specific to each agent?
--
-- Section 2: Current NTP status (dynamic information)
--
ntpSrvStatus OBJECT IDENTIFIER ::= { ntpSnmp 1 }
ntpSrvStatusCurrentState OBJECT-TYPE
SYNTAX DisplayString
MAX-ACCESS read-only
STATUS current
DESCRIPTION
"actual status of NTP as a string"
--- possible strings:
--- "not running" : NTP is not running
--- "not synchronized" : NTP is not synchronized to any time source
(stratum = 16)
--- "sync to local" : NTP is synchronized to own local clock
(degraded reliability)
--- "sync to refclock" : NTP is synchronized to a local hardware
refclock (e.g. GPS)
--- "sync to remote server" : NTP is synchronized to a remote NTP
server ("upstream" server)
::= { ntpSrvStatus 1 }
xxx I don't know what you have in mind here. Is there an agent in the
monitored machine that knows when ntpd is not running and when it is?
The ntpd itself doesn't know when it is not running. The "not
synchronized" can be determined from the system peer association ID, but
not from the stratum. What you want to know is whether the
synchronization distance is above or below the distance threshold (1 s
configurable). Until the threshold is crossed the client is by
definition synchronized. The machine is synchronized to a reference
clock if running at stratum one. Which kind of source is available in
the reference identifie.
ntpSrvStatusCurrentStateVal OBJECT-TYPE
SYNTAX INTEGER {
notRunning(0),
notSynchronized(1),
syncToLocal(2),
syncToRefclock(3),
syncToRemoteServer(4),
unknown(99)
}
MAX-ACCESS read-only
STATUS current
DESCRIPTION
"current state of the NTP as integer value"
-- see ntpSrvStatusCurrentState
DEFVAL { 99 }
::= { ntpSrvStatus 2 }
xxx I don't know how to code this.
ntpSrvStatusStratum OBJECT-TYPE
SYNTAX INTEGER (1..16)
MAX-ACCESS read-only
STATUS current
DESCRIPTION
"own stratum value"
-- should be stratum of syspeer + 1 (or 16 if no syspeer)
DEFVAL { 99 }
::= { ntpSrvStatus 3 }
xxx I don't understand the + 1. The stratum displayed should be the
stratum assigned by the algorithms. The stratum does not go to 16 (0 on
the wire) when all sources go awayi. It is not intended as
synchronization status indicator.
ntpSrvStatusActiveRefclockId OBJECT-TYPE
SYNTAX INTEGER ( 0..99999 )
MAX-ACCESS read-only
STATUS current
DESCRIPTION
"the association ID of the current syspeer"
DEFVAL { 99 }
::= { ntpSrvStatus 4 }
xxx If this displays as zero, the machine has no active sources, but
continues to be valid as a server until the synchronization distance has
crossed the threshold.
ntpSrvStatusActiveRefclockName OBJECT-TYPE
SYNTAX DisplayString
MAX-ACCESS read-only
STATUS current
DESCRIPTION
"The hostname/descriptive name of the current refclock selected
as syspeer"
-- e.g. "ntp1.ptb.de" or "GPS" or "DCFi" ...
-- maybe something like "RefClk(8)" = "hardware clock using driver
8" would be nice
::= { ntpSrvStatus 5 }
xxx The reference identifier is intended to serve this purpose. The
refernce implementation purposely does not try to resolve a host name,
as with IPv6 only a hash is available.
ntpSrvStatusActiveOffset OBJECT-TYPE
SYNTAX DisplayString
MAX-ACCESS read-only
STATUS current
DESCRIPTION
"Time offset to the current selected refclock as string"
-- including unit, e.g. "0.032 ms" or "1.232 s"
::= { ntpSrvStatus 6 }
xxx This is the system clock offset available in the mode-6 protocol.
ntpSrvStatusActiveRefclockOffsetVal OBJECT-TYPE
SYNTAX REAL
MAX-ACCESS read-only
STATUS current
DESCRIPTION
"Time offset between the current selected refclock and time of
NTP in miliseconds "
DEFVAL { 0 }
::= { ntpSrvStatus 7 }
This value is of course in IEEE floating double format, but could just
as readily be converted from the display string by the agent.
ntpSrvStatusFrequency OBJECT-TYPE
SYNTAX REAL
MAX-ACCESS read-only
STATUS current
DESCRIPTION
"Frequency Drift of the NTP server"
DEFVAL { 0 }
::= { ntpSrvStatus 8 }
xxx I suggest using the term frequency offset, not drift, as that
ordinarily speaks to what we call wander.
ntpSrvStatusNumberOfRefclocks OBJECT-TYPE
SYNTAX INTEGER (0..99)
MAX-ACCESS read-only
STATUS current
DESCRIPTION
"Number of refclocks configured in the NTP "
DEFVAL { 0 }
::= { ntpSrvStatus 9 }
xxx I don't see this as useful. What you probably want is the number of
configured associations and the number of survivors of the mitigation
algorithms. It doesn't matter how many refclocks there are, just whether
one of them has control of the clock discipline and that is evident from
the stratum.
ntpSrvStatusAuthKeyId OBJECT-TYPE
SYNTAX INTEGER ( 0..1024 )
MAX-ACCESS read-only
STATUS current
DESCRIPTION
"Authentication Key ID of active refclock is active "
-- xxxTODOxxx Check docs :"How many keys are allowed?"
DEFVAL { 0 }
::= { ntpSrvStatus 10 }
xxx Refclocks are not normally authenticated; remote servers and peers
can be, but each can have a different key. Ordinarily, you don't care
about the maximum number of keys; the reference implementation can
allocate probably many thousands before running out of memory.
ntpSrvStatusServiceUptime OBJECT-TYPE
SYNTAX TimeTicks
MAX-ACCESS read-only
STATUS current
DESCRIPTION
"Uptime of NTP service"
-- time since ntpd was (re-)started
DEFVAL { 0 }
::= { ntpSrvStatus 11 }
xxx This is available in the systats monitoring data and could be
available to SNMP as well. Take a look at the data recorded; it is
similar to the data now used by NIST. That's the stuff I want for
logging purposes.
xxx Missing: the basic performance data I watch are the offset, root
distance, frequency and jitter.
--
-- Section 3: Status of all currently mobilized associations
--
ntpSrvAssociations OBJECT IDENTIFIER ::= { ntpSnmp 3 }
ntpSrvAssocTable OBJECT-TYPE
SYNTAX SEQUENCE OF ntpAssociation
MAX-ACCESS read-only
DESCRIPTION
"Table of currently mobilized associations"
::= { ntpSrvAssociations 1 }
xxx Easy.
ntpSrvAssociation SEQUENCE {
ntpSrvAssocId Integer32,
ntpSrvAssocName DisplayString,
ntpSrvAssocAddress DisplayString,
ntpSrvAssocOffset DisplayString,
ntpSrvAssocStratum INTEGER,
ntpSrvAssocPollInterval INTEGER,
ntpSrvAssocTimeToNextPoll INTEGER,
ntpSrvAssocReachability INTEGER,
ntpSrvStatusAssocOffsetVal REAL,
ntpSrvStatusAssocJitterVal REAL,
ntpSrvStatusAssocDelayVal REAL
}
xxx Associations don't have names; the only reliable handle is the
association ID. They are distinguished by source IP address and source
port (only). The time to the next poll is not available, as it can be
changed on-fly and cannot be predicted. The performance variables are
offset, delay, dispersion and jitter. The status variables are stratum,
time since last update, reachability register and poll interval.
However, the first diagnostic I look at are the flash bits.
ntpSrvAssocId OBJECT-TYPE
SYNTAX Integer32 ( 0..99999 )
MAX-ACCESS read-only
DESCRIPTION
"Association ID"
::= { ntpSrvAssociation 1 }
ntpSrvAssocName OBJECT-TYPE
SYNTAX DisplayString
MAX-ACCESS read-only
DESCRIPTION
"Hostname or other descriptive name for association"
::= { ntpSrvAssociation 2 }
ntpSrvAssocAddress OBJECT-TYPE
SYNTAX DisplayString
MAX-ACCESS read-only
DESCRIPTION
"IP address (IPv4 or IPv6) of association OR refclock driver ID"
-- contains IP address of uni/multi/broadcast associations or
-- a refclock driver ID like "127.127.1.0" for other associations
::= { ntpSrvAssociation 3 }
ntpSrvAssocOffset OBJECT-TYPE
SYNTAX DisplayString
MAX-ACCESS read-only
STATUS current
DESCRIPTION
"Time offset to the association as string"
-- including unit, e.g. "0.032 ms" or "1.232 s"
::= { ntpSrvAssociation 4 }
ntpSrvAssocStratum OBJECT-TYPE
SYNTAX INTEGER (1..16)
MAX-ACCESS read-only
STATUS current
DESCRIPTION
"stratum level of the association"
-- should be stratum of the associations syspeer + 1 (or 16 if no
syspeer)
DEFVAL { 99 }
::= { ntpSrvAssociation 5 }
ntpSrvAssocPollInterval OBJECT-TYPE
SYNTAX INTEGER
MAX-ACCESS read-only
STATUS current
DESCRIPTION
"polling interval for the association in seconds"
-- reflects the number of seconds between two consecutive polls
-- can be typically one of the following:
-- 64, 128, 256, 512 or 1024
DEFVAL { 99 }
::= { ntpSrvAssociation 6 }
ntpSrvAssocTimeToNextPoll OBJECT-TYPE
SYNTAX INTEGER
MAX-ACCESS read-only
STATUS current
DESCRIPTION
"number of seconds until next poll"
-- reflects the number of seconds between two successive polls
DEFVAL { 99 }
::= { ntpSrvAssociation 7 }
ntpSrvAssocReachability OBJECT-TYPE
SYNTAX INTEGER (0..255)
MAX-ACCESS read-only
STATUS current
DESCRIPTION
"results of the last 8 polls in decimal notation"
-- reflects the results of the last 8 polls as a decimal value where
-- each of the 8 bits will be set to 1 if the corresponding poll was
-- successful (i.e. the host was reached and replied) or it will be
set to
-- a value of 0 if the host did not reply.
-- The last result is represented by the first bit
-- Examples:
-- Decimal 239 = Binary 11101111 = the last three polls were
successful, before that there was one failed attempt and another four
successful tries
-- Decimal 7 = Binary 00000111 = the last five polls failed
-- Decimal 252 = Binary 11111100 = the last six polls were successful
DEFVAL { 0 }
::= { ntpSrvAssociation 8 }
xxx Why isn't this a bit string? Or, do you expect the agent to convert
to eye candy?
ntpSrvStatusAssocOffsetVal OBJECT-TYPE
SYNTAX REAL
MAX-ACCESS read-only
STATUS current
DESCRIPTION
"Time offset to the association in miliseconds "
DEFVAL { 0 }
::= { ntpSrvAssociation 9 }
xxx The display units for ntpq are in milliseconds for time offsets and
PPM for frequency offsets. There is no need to do this for SNMP reals;
seconds and seconds/second would be more appropriate.
ntpSrvStatusAssocJitterVal OBJECT-TYPE
SYNTAX REAL
MAX-ACCESS read-only
STATUS current
DESCRIPTION
"Jitter in miliseconds "
DEFVAL { 0 }
::= { ntpSrvAssociation 10 }
ntpSrvStatusAssocDelayVal OBJECT-TYPE
SYNTAX REAL
MAX-ACCESS read-only
STATUS current
DESCRIPTION
"Network delay in miliseconds"
DEFVAL { 0 }
::= { ntpSrvAssociation 11 }
ntpSrvStatusAssocFilterEntries OBJECT-TYPE
SYNTAX INTEGER
MAX-ACCESS read-only
DESCRIPTION
"Number of available entries in the Filter Table for this association"
-- should be at least 6
::= { ntpSrvAssociations 12 }
xxx I don't know what this means. The reachability register and time
since last update are more revealing. The peer dispersion statistic
reflects indirectly the number of filter samples and the rank in the
mitigation algorithms.
ntpSrvStatusAssocFilterTable OBJECT-TYPE
SYNTAX SEQUENCE OF ntpAssoFilterEntry
MAX-ACCESS read-only
DESCRIPTION
"Table of the filter values of currently mobilized associations"
::= { ntpSrvAssociations 13 }
ntpSrvAssocFilterEntry SEQUENCE {
ntpSrvAssocId Integer32,
ntpSrvFilterIndex INTEGER,
ntpSrvAssocFilterOffset REAL,
ntpSrvAssocFilterDisp REAL,
ntpSrvAssocFilterDelay REAL
}
xxx You need jitter here, too.
ntpSrvAssocFilterIndex OBJECT-TYPE
SYNTAX INTEGER
MAX-ACCESS read-only
STATUS current
DESCRIPTION
"Index for the Filter Table"
-- the table row representing the filter values for the lastest poll
will have
-- FilterIndex = 0, the oldest row has FilterIndex = FilterEntries
DEFVAL { 0 }
::= { ntpSrvAssociation 14 }
xxx This is misleading. The table returned should be in the order of
arrival. The filter order is not normally useful, just the one actually
selected, and the statistics for this one shown in the peer variables.
ntpSrvAssocFilterOffset OBJECT-TYPE
SYNTAX REAL
MAX-ACCESS read-only
STATUS current
DESCRIPTION
"Filter offset"
-- "Offset" column of the filter table
DEFVAL { 0 }
::= { ntpSrvAssociation 15 }
ntpSrvAssocFilterDisp OBJECT-TYPE
SYNTAX REAL
MAX-ACCESS read-only
STATUS current
DESCRIPTION
"Filter dispersion"
-- "Dispersion" column of the filter table
DEFVAL { 0 }
::= { ntpSrvAssociation 16 }
ntpSrvAssocFilterDelay OBJECT-TYPE
SYNTAX REAL
MAX-ACCESS read-only
STATUS current
DESCRIPTION
"Filter delay"
-- "Delay" column of the filter table
DEFVAL { 0 }
::= { ntpSrvAssociation 17 }
--
-- Section 4: Server SNMP trap definitions
--
-- xxxTODOxxx : define Payload
ntpSrvTraps OBJECT IDENTIFIER ::= { ntpSnmp 4 }
ntpSrvTrapNotSync NOTIFICATION-TYPE
STATUS current
DESCRIPTION
"trap to be sent when NTP is not synchronised "
::= { ntpSrvTraps 1 }
xxx The original intent of the NTP traps issue was as a nofication that
some component changed state or some value became out of tolerance. The
idea was to provide a handle so that a human or AI program would know
what MIB queries to send for further information. So at least the trap
should include the association ID or zero and the name of the variable
which has changed state or gone out of tolerance. It is likely that the
human or AI program will have a menu that says if this trap with this
name is received, then issue one or more queries and format the results.
The most crucial traps are when the object first comes up or voluntarily
exits. At present this happens only in ntpdate mode, which does not seem
of interest trap-wise, and when exceeding the panic threshold. Other
obvious events are when first synchronized, when all sources have become
unreachable and when all distances have exceeded the distance threshold.
I don't think you need more than that for state-change events.
For out of tolerance traps, consider the step, stepout and panic
thresholds and distance threshold for each source. These are all one
trap with the name of the associated MIB variable and association ID.
You also need the clock state number in the MIB and a trap when it
changes value. You might need another one for the frequency if it hits
the limit.
More information about the ntpwg
mailing list