The role of CAS in a durability poll?

tl;dr: 2x CAS Stores in a row for same key, durability polling replication for both, will the first one _always_ succeed?

I recall reading somewhere in the docs that multiple successive writes to a single key won't cause each one to be persisted to disk. This optimization makes sense, but that kind of behaviour is what's making me wonder what the roll of CAS in a durability poll is. Note that even though I did just mention persistence, I really am only interested in replication (so if there's a difference...).

Ex (pseudo):

Process 1:
resp = storeWithCas("keyX", value); //succeeds
durabilityPoll(resp, replications=2);

Process 2:
storeWithCas("keyX", value); //succeeds
durabilityPoll(resp, replications=2); if both of those processes are run at the same time (and they retry with exponential backoff should one affect the other), will both of those durability polls succeed?

Going back to the persistence analogy, I know it to be true that whichever value is stored first 'probably' won't be written to disk. So since replication and persistence are handled similarly in Couchbase ("eventual"), does that also mean that the first value set 'probably' won't be replicated (causing the durability poll to fail)???

I would try to test this, but it's a race condition so eh...

role of pole ;-P

1 Answer

« Back to question.

Understanding how durability_poll works requires understanding of the lcb_observe command. Basically durability_poll will _only_ be able to count items which contain the same CAS as exists in the master. Its role is twofold:

(1) If the item exists in the _master_ with a different CAS, durability_poll returns an error immediately saying LCB_KEY_EEXISTS, which in this case means something else has overwritten this item.

(2) If the item exists in the replicas and does _not_ have the same CAS the respective nodes are omitted from the check as it is assumed those nodes contain a potentially older copy of the item.

The reason why (1) does not automatically use the 'new CAS' from the master is because at this point the master may have changed from the time when you originally set the key (i.e. you failed over the master and a replica took over). In this scenario the different CAS in the current master's cache might not be from a newer mutation but possibly from an older mutation.

So the first reply was my reply to your "What is the role of CAS" question.

Now here's how this interleaves with successive mutations with the same key:

You are correct that successive mutations won't be stored to disk, but this also means that with successive mutations your initial poll will _fail_ because you'll get an error saying the key differs on the masters' cache (the RAM in master is always considered to be authoritative). In terms of the race condition you proposed before, the _last one always wins_ because the last one has the last mutation and thus holds the current CAS.

Yet another note...

If your application ends up getting a KEY_EEXIST, what to do next depends on how your data is structured and whether you have any way of internally keeping track of monotonically increasing increment counters. A CAS value does not necessarily have a monotonic increment (or an increment as all, so it is not a reliable measure).

On the one hand, an EEXISTS may be a result of having a cluster failover and actually receiving the _previous_ version of the item. While it would be rare that you would have a failover which takes place after the item was stored but before the durability poll it is still technically possible.

If you wish to risk the remote possibility mentioned above, normally an EEXISTS will suggest that a _subsequent_ operation has modified the key and the contents are newer and thus it should not be retried. Otherwise probably adding a monotonically increasing counter in your data and reading it back to check if it is higher (i.e. newer) or lower (i.e. older) than its value before the original set, is your best bet.