I am seeing intermittent document unlock issues with Pessimistic locking approach. I am using below strategy to lock-unlock a document:
- Use getAndLock with a lock timeout (3 sec) to lock a document and get CAS value.
- Make modifications to the document and use replace with above CAS value to replace the document.
This approach mostly works and replace operation releases the lock on the document. But occasionally replace does NOT release the lock and any subsequent getAndLock fails to acquire the lock and throws TemporaryLockFailureException. Even the explicit ‘3 sec’ timeout does not release lock.
Is there a known issue or scenario when this can happen ?
Sounds like a problem, but it’s more likely a problem with the KV service. By chance do you have a simple test program?
Also, is there any concurrency, or you can repro it with just a single actor?
Yes, i am able to reproduce with single actor, but the behavior is not consistent. However its more frequent during load testing.
Here is a scenario that I observed :
Process 1 acquired lock(getAndLock) at 2:40:09.880 and did replace operation with correct CAS at 2:40:09.894.
Process 2 tried to acquire lock at 2:40:13.676 (which is almost 4 seconds after previous replace) but received TemporaryLockFailureException
The replace operation or the 3 sec explicit timeout should have released the lock
Also, is there a way to know if replace operation actually released the lock
If it is based on the timeout, it’s probably best observed with logging from the memcached process, but you may need a higher logging level. I don’t know that logging for sure though, but that’s what I’d try first.
If you have a small test case, it’d be great to file an issue against Couchbase Server.