[ntp:questions] Clarification of protostats decoding - status word and event message code

Brian Inglis Brian.Inglis at SystematicSw.ab.ca
Tue Apr 5 21:29:19 UTC 2016


On 2016-04-05 14:26, Frank Wayne wrote:
> On 2016-04-03 19:32, Frank Wayne wrote:
> ________________________________________
> From: questions on behalf of Brian Inglis
>Sent: Tuesday, April 05, 2016 01:48
> To: questions at lists.ntp.org
> Subject: Re: [ntp:questions] Clarification ofprotostats decoding
>- status word and event message code
>> I'm trying to understand what the fields in the protostats file are
>> and there are a couple of items for which the documentation (the
>> "Event Messages and Status Words" page) is unhelpful. Below is the
>> example event given on the "Monitoring Commands and Options" page
>> of the documentation, for the sake of discussion.
>>
>> 49213 525.624 128.4.1.1 963a 8a message
>>
>> 963a is the "status word". I assume this is the "peer status word",
>> the same as in peerstats, and not the "system status word", which
>> doesn't seem to show up in any stats files as far as I can tell.
>> Can anyone confirm that this can always be decoded as a "peer
>> status word" (status:select:count:code)?
>
> It's a peer status word unless the address is 0.0.0.0, when it's a
> system status word, or the event nibble is 0xb clock event, when the
>  message is a clock event message. You will only see these messages
> in the log if you have enabled high logging levels e.g. "logconfig
> =allall" in ntp.conf, or in protostats, and usually only at startup.
> If you see them at other times, it is likely to be a serious network
> or refclock problem, or other exceptional event, such as a leap
> second.
>
>> 8a is the "event message code" and is not described anywhere as far
>> as I can tell. The least significant nybble always seems to be the
>> same as the least significant nybble of the imprecisely-named
>> "status word", as does the most significant bit, but I don't have
>> enough samples to make any conclusions. Does anyone know what this
>> is?
>
> All peer event message codes are ored with 0x80 and all crypto event
> message codes are ored with 0x100 to distinguish them from system
> and clock events. See the source: egrep 'EVNT_|_EVENT'
> **/include/ntp{,_crypto}.h.
>
> Most of the peer messages have event message codes: 81 mobilize 83
> unreachable 84 reachable 8a sys_peer
  
> Thank you, Brian! Decoding the words as system status words makes
> more sense in that case.
>
>> It's a peer status word unless the address is 0.0.0.0, when it's a
>>  system status word, or the event nibble is 0xb clock event, when
>> the message is a clock event message. You will only see these
>> messages in the log if you have enabled high logging levels e.g.
>> "logconfig =allall" in ntp.conf, or in protostats, and usually
>> only at startup. If you see them at other times, it is likely to be
>> a serious network or refclock problem, or other exceptional event,
>> such as a leap second.
>
> Now I’m just confused about the clock events. If the address is
> 0.0.0.0, then the status word is a system status word. If the
> address is NOT 0.0.0.0 AND the event nybble is NOT 0xb, the status
> word is a peer status word. If the address is NOT 0.0.0.0 AND the
> event nybble is 0xb, what is the status word? It isn’t a clock status
> word.
>
> Examples from my protostats log:
>
> <timestamp> GPS_NMEA(0) 802b 8b clock_event clk_bad_format
> <timestamp> PPS(0) 974b 8b clock_event clk_no_reply
> <timestamp> GPS_NMEA(0) 964b 8b clock_event clk_bad_format
>
> The last nybble in these is 0xb, as it should be. The documentation
> says that the clock status word is “Unused (0-7), Count (8-11), and
> Code (12-15)”. The “Code” field would refer to the “Event Field”,
> which has valid values between 0 and 6 (corresponding to CEVNT_* in
> ntp.h). However, this decoding is incorrect, since the Code/Event
> field is always 0xb.
>
> The binary representations of the status words above:
>
> 802b: 1000 0000 0010 1011 (clk_badformat)
> 974b: 1001 0111 0100 1011 (clk_noreply)
>964b: 1001 0110 0100 1011 (clk_badformat)
>
> clk_noreply is 0001b; clk_badformat is 0010b. I don’t see where a
> clock event code could fit.

Clock event codes appear alone rarely (never?)
That status is still a *peer* status word, but it says there's a
clock event, and the *message* displayed is a *clock* event message,
prefixed by "clk_event" peer status.
Most (all?) ref clock events are reported against the ref clock peer,
not as stand alone events.

-- 
Take care. Thanks, Brian Inglis, Calgary, Alberta, Canada


More information about the questions mailing list