[ntp:hackers] [Gpsd-dev] A lightweight synchronization mechanism for shared memory

Terje Mathisen terje at tmsw.no
Fri Mar 25 08:56:32 UTC 2011


Håkan Johansson wrote:
>
> On Fri, 25 Mar 2011, Terje Mathisen wrote:
>> I've been reading the source code for this shm interface, and to me it
>> seems almost like a no-brainer to go to a circular buffer of 2^N
>> entries, along with a single (write-update only) counter at the front
>> end which indicates where the latest valid entry is located.
>
> Going to a more-than-1 entry buffer does not immediately give any
> further guarantees, but it does open another loophole:
>
> If the writer crashes in a mid-update, say entry 3, then until the next
> alive writer has overwritten that entry, say when it just claims to be
> writing next entry 4, the reader may get the idea that the previous
> entry (number 3) is now good and usable.

I did state up front that my approach is specifically designed for a 
single writer with one or more readers: If the writer crashes then you 
obviously get no more valid updates! :-)

> Perhaps fixable by part of the count (or another variable) holding also
> a bitmask of the valid entries, but such an approach requires the
> analysis of a lot more contrieved edge-cases to (if at all possible)
> make guarantees, as opposed to the 1-entry method. Where we know it
> cannot always allow a successful correct read within finite time, but we
> know how to make any claimed successful read also be that.

See above.
>
>> The readers (one or more) can then always read the last valid entry, it
>> might be a re-read of what they read on the previous iteration, but that
>> is something which it is easy for the reader to guard against:
>>
>> unsigned cnt = shm->count;
>> if (cnt == prev_cnt) return;
>>
>> When the counter is valid you mask it with the buffer count mask to
>> bring it into range:
>
> If I'm only allowed to mask the counter when it is valid, the time
> during any writer update makes all entries unusable anyhow, so 2^N is no
> better than 1.
>
>> prev_cnt = cnt;
>> cnt &= shm->count_mask; // Bring it into [0..7] range?
>>
>> Copy the current entry into local vars:
>>
>> tvr.tv_sec=shm->[cnt].receiveTimeStampSec;
>> tvr.tv_usec=shm->[cnt].receiveTimeStampUSec;
>> tvt.tv_sec=shm->[cnt].clockTimeStampSec;
>> tvt.tv_usec=shm->[cnt].clockTimeStampUSec;
>>
>> This is of course still not absolutely bulletproof, in that a reader
>> process which starts to read an entry, then gets put to sleep for N(=8?)
>> seconds, can wake up again and read the last half of an updated record.
>>
>> This is easy to guard against by re-reading the shm->count value: Since
>> this counter is explicitly allowed to wrap around at 2^32 or 2^64, you
>> cannot sleep for that many seconds.
>>
>> a) You can always read the last valid entry.
>> b) The only sync barrier needed is in the producer (gpsd) which should
>> have a write barrier between the last write to the new record, and the
>> increment of the shm->count variable.
>
> Read barriers too. Please see the linux kernel documentation:
>
> http://www.kernel.org/doc/Documentation/memory-barriers.txt
>
> Also note how the word 'volatile' is absent.

I know, it doesn't really help.
>
> Another point to note is that the way in which the memory copying is
> implemented - inlined, function call, for loop, even
> local-cpu-irq-disabled - does not help against the interplay of the
> memory subsystems of the two cpus involved, which - without
> write/read-barriers - may provide the read data as new and old intermixed.

Yes indeed.

Terje
-- 
- <Terje at tmsw.no>
"almost all programming can be viewed as an exercise in caching"


More information about the hackers mailing list