[ntp:hackers] [Gpsd-dev] ntpd shm changes

Håkan Johansson f96hajo at chalmers.se
Sun Mar 20 22:10:58 UTC 2011


Hi,

I think there are a few simplifications in the NTPSHM usage compared to 
normal resource sharing, namely:

- There is only one writer.

- We do not require the reader to always succeed.

The first point is that the entire construction implicitly assumes that 
data only comes from one source.  NTP would be time-keeping-wise confused 
by time-stamps from two devices occurring in the same segment.  This means 
that no semaphores etc. are needed control write access to the resource. 
The only 'locking' required is to ensure that the reader does not use 
incomplete data.  This, with exception of the missing memory barriers, is 
presently ensured.  And the present implementation can keep this property 
while being redesigned to hold both valid and count in the same variable, 
if the reader also stops playing with the equivalent of valid.  The main 
reason for doing that would be to allow multiple readers.

Writer:

count |= 1;
write-memory-barrier;
update data fields...;
write-memory-barrier;
count++;

Reader:

get count;
if (count even) {
   read-memory-barrier;
   get data fields...;
   read-memory-barrier;
   get count again;
   if (count unchanged)
     use data...;
}

There is also no need for atomic operations.  If the reader should happen 
to get in and do its job while the writer is underway doing any of the 
updates of count, the reader may depending on the incomplete count it gets 
decide to use the data or not.  But the data as such will be complete, 
thanks to the barriers.  There is no way for the reader to do the two 
reads of count and find it both even and unchanged unless both reads are 
either before or after the write-barriers.

For barriers, see smp_wmb() and smp_rmb() in linux.  Short inline 
assembler per architecture.  include/asm-xxx/system.h


The second point in not requiring the reader to succeed is something that 
has to be bought together with the simplicity (and easy 'crash-recovery') 
of the shared memory segment.  The 'only' thing that can happen is that 
the reader sees that the values are bad and does not use them.  This of 
course can happen many times in a row, if we have the utter bad luck of 
always doing the read process during the writes.  Attempting a full 
re-read does not guarantee success, as the writer may be (indefinately) on 
hold.  Without additional (real) locking between the reader and writer, it 
also cannot be solved by using a two-entry (ping-pong) data section, as it 
with a pure 'one writer-one reader' stands the chance to go into infinite 
retries.  If this kind of guarantee is wanted, I belive that one should 
completely abandon the shared memory segment and attempts to implement 
full locking in that and rather use a unix socket (pipe), as recently done 
by chrony, and thus let the operating system worry about all deadlock 
issues.  Also, if one tries a reader-writer locking approach, one needs to 
deal with a possibly crashed party, and it also becomes impossible (again) 
to have several readers.


With an assumption that the reader is much faster than the writer update 
cycle, a ping-pong data buffer could suffice to deal with the potential 
streak of missed reads every time, together with a retry reads two times. 
Two times retry as count is updated twice by the writer.

Let the second lowest bit of count tell which buffer is valid when count 
is even.  Another variable is needed to hold the (previous) valid buffer 
in case the reader finds the data during count is odd (but stable!).

Writer:

valid_buffer = (count >> 1) & 1;
write-memory-barrier;
count |= 1;
write-memory-barrier;
update buffer (count >> 1) & 1;
write-memory-barrier;
count++;

And then a bunch of cases in the reader...  Re-read in case count changed.

As this becomes much more complicated, and still has no guarantee, I'd 
rather suggest to use one data buffer and for the reader to shift its 
polling interval by a (random) amount instead if it finds the count 
odd/changed.

Cheers,
Håkan


More information about the hackers mailing list