[ntp:hackers] [Gpsd-dev] Single-writer/many-reader consistency

Dave Hart davehart_gmail_exchange_tee at davehart.net
Sat Mar 26 20:55:28 UTC 2011


On Sat, Mar 26, 2011 at 7:36 PM, Eric Raymond <esr at thyrsus.com> wrote:
> It's the same painful tradeoff that tz and I and other discussants
> have been grappling with for the last week in one way or another.  The
> methods that provably prevent corrupt data have nasty blocking failure
> modes; the methods that don't have scary worst-case behavior may
> produce intermittent data corruption in the *normal* case if the underlying
> hardware messes with operation order enough.
>
>
> I am by no means sure I know where to land.  I have working seqlocks
> now and am oscillating back and forth about whether to scrap that and go
> with semaphores.  I was almost ready to do it until two things happened:
>
> (a) tz pointed out correctly that in a semaphore-based approach, readers
> have to assert as well as writers - otherwise writer can start an update while
> reader is still reading.
>
> (b) I then realized that under a correct semaphore implementation,
> anybody dying while semaphore is asserted starves everybody else
> (which we should have remembered from Operating Systems 101).  This is
> a very practical problem because while we haven't had a crash bug in
> gpsd since something like 2005, I can't say the same thing about all
> the clients out there!

Thanks for your clear summary of the dilemma.

> This is pushing me back towards sticking with the working seqlock.
> But *that* means I have to either deal with the pile of nonportable
> hair that is memory barriers or expensively checksum the data.

If you do checksum, ISTR CRC32 isn't terribly strong for 8k.  MD5 is
widely-available and reasonably fast, and for this purpose, I beieve
its cryptographic weakness is any reason to avoid it.

> Yes, I've been listening to Terje too.  The problem is that his proposal
> is complicated enough to make my head hurt - and I plain don't trust
> complicated algorithms in cases like this, they tend to have holes
> where nobody is looking and be difficult to *maintain* correctness of
> even if they started out correct.

The cost of 4 payload copies is trivial for refclock_shm.c, and CRC32
is more than sufficient to detect corruption and could be layered in
for belt-and-suspenders detection of problems with the lock-free ring
buffer approach.  The tradeoffs for gpsd are different.

Personally, my head isn't hurting thinking about the approach Terje
describes for refclock_shm.  I have used similar approaches for
producer/consumer lockless queueing repeatedly in the past.  That was
usually solving a slightly different problem, with a single or several
consumers that dequeue items, but it's easier with a more
broadcast-like arrangement of single writer, multiple readers who
never change shared memory, only examine.

Cheers,
Dave Hart


More information about the hackers mailing list