The role of CAS in a durability poll?

d3fault · February 26, 2014, 11:51pm

tl;dr: 2x CAS Stores in a row for same key, durability polling replication for both, will the first one always succeed?

I recall reading somewhere in the docs that multiple successive writes to a single key won’t cause each one to be persisted to disk. This optimization makes sense, but that kind of behaviour is what’s making me wonder what the roll of CAS in a durability poll is. Note that even though I did just mention persistence, I really am only interested in replication (so if there’s a difference…).

Ex (pseudo):

Process 1:
resp = storeWithCas(“keyX”, value); //succeeds
durabilityPoll(resp, replications=2);

Process 2:
storeWithCas(“keyX”, value); //succeeds
durabilityPoll(resp, replications=2);

…so if both of those processes are run at the same time (and they retry with exponential backoff should one affect the other), will both of those durability polls succeed?

Going back to the persistence analogy, I know it to be true that whichever value is stored first ‘probably’ won’t be written to disk. So since replication and persistence are handled similarly in Couchbase (“eventual”), does that also mean that the first value set ‘probably’ won’t be replicated (causing the durability poll to fail)???

I would try to test this, but it’s a race condition so eh…

d3fault · February 26, 2014, 11:51pm

role of pole

mnunberg · March 2, 2014, 12:29am

So the first reply was my reply to your “What is the role of CAS” question.

Now here’s how this interleaves with successive mutations with the same key:

You are correct that successive mutations won’t be stored to disk, but this also means that with successive mutations your initial poll will fail because you’ll get an error saying the key differs on the masters’ cache (the RAM in master is always considered to be authoritative). In terms of the race condition you proposed before, the last one always wins because the last one has the last mutation and thus holds the current CAS.

mnunberg · March 2, 2014, 12:32am

Understanding how durability_poll works requires understanding of the lcb_observe command. Basically durability_poll will only be able to count items which contain the same CAS as exists in the master. Its role is twofold:

(1) If the item exists in the master with a different CAS, durability_poll returns an error immediately saying LCB_KEY_EEXISTS, which in this case means something else has overwritten this item.

(2) If the item exists in the replicas and does not have the same CAS the respective nodes are omitted from the check as it is assumed those nodes contain a potentially older copy of the item.

The reason why (1) does not automatically use the ‘new CAS’ from the master is because at this point the master may have changed from the time when you originally set the key (i.e. you failed over the master and a replica took over). In this scenario the different CAS in the current master’s cache might not be from a newer mutation but possibly from an older mutation.

mnunberg · March 2, 2014, 12:36am

Yet another note…

If your application ends up getting a KEY_EEXIST, what to do next depends on how your data is structured and whether you have any way of internally keeping track of monotonically increasing increment counters. A CAS value does not necessarily have a monotonic increment (or an increment as all, so it is not a reliable measure).

On the one hand, an EEXISTS may be a result of having a cluster failover and actually receiving the previous version of the item. While it would be rare that you would have a failover which takes place after the item was stored but before the durability poll it is still technically possible.

If you wish to risk the remote possibility mentioned above, normally an EEXISTS will suggest that a subsequent operation has modified the key and the contents are newer and thus it should not be retried. Otherwise probably adding a monotonically increasing counter in your data and reading it back to check if it is higher (i.e. newer) or lower (i.e. older) than its value before the original set, is your best bet.

Topic		Replies	Views
Can I specify CAS value and durability requirements together in a store operation? .NET SDK	2	2352	December 3, 2014
Durability requirement failed: The CAS on the active node changed for ID “XXX, indicating it has been modified in the meantime Java SDK	3	765	February 6, 2020
Durability Questions Couchbase Server	4	472	November 14, 2023
Why do i use replicateTo.ONE may throwable this exception? Java SDK	21	5287	February 16, 2016
CAS Validation - Is it possible? .NET SDK	5	2344	January 6, 2015

The role of CAS in a durability poll?

Related topics