@mreiche I have finally find the issue.
When you do cluster.bucket('foo')
, it opens the bucket behind the scene, by calling the C++ bindings.
Because this operation is not awaited (see code), it is possible that you try to perform an operation towards a bucket that is not opened yet.
When that happens, the C++ client to queue your operation and retry every 500ms.
If you do an operation after the bucket has been opened but before those 500ms, this operation will go through immediately, not waiting the next 500ms tick.
During the tests, the bucket will be created, opened (opening would fail), insert would be tried and succeed, therefore the tests ends, triggering the destruction of the bucket. Then the 500ms tick kicks in, retries to open the bucket, which does not exist anymore, triggers an error.
When running a single test this does not happen, because the connection is closed, aborting pending operations.
This explains why adding a 200ms sleep did not solve the issue but adding a 500ms solved it : the connection was delayed by 500ms, so the test would not complete before the retry, therefore the retry would succeed before the test ends & the bucket was destroyed.
What I found interesting is that, after each test, the connection is closed. So why the pending operations would be retried ?
Reading the couchbase log I see :
[2024-05-02 17:10:36.793] 0ms [debu] [52009,588682] [5cf525-3bc7-e944-ed02-41700d38356314/98313a-fa22-2748-7b7b-38b38cf3dd7aa3/plain/-] <localhost/::1:11210> stop MCBP connection, reason=do_not_retry
[2024-05-02 17:10:36.793] 0ms [debu] [52009,588682] [5cf525-3bc7-e944-ed02-41700d38356314/98313a-fa22-2748-7b7b-38b38cf3dd7aa3/plain/-] <localhost/::1:11210> destroy MCBP connection
... later ...
[2024-05-02 17:10:37.184] 0ms [debu] [52009,588682] [5cf525-3bc7-e944-ed02-41700d38356314/043936-bca2-9e48-f4bf-f23683392514d9/plain/cbjs_b_342c566b] <localhost/::1:11210> unable to select bucket: cbjs_b_342c566b, probably the bucket does not exist
[2024-05-02 17:10:37.184] 0ms [debu] [52009,588682] all nodes failed to bootstrap, triggering DNS-SRV refresh, ec=bucket_not_found (10), last endpoint="localhost:11210"
[2024-05-02 17:10:37.184] 0ms [warn] [52009,588682] [5cf525-3bc7-e944-ed02-41700d38356314/043936-bca2-9e48-f4bf-f23683392514d9/plain/cbjs_b_342c566b] <localhost/::1:11210> failed to bootstrap session ec=bucket_not_found (10), bucket="cbjs_b_342c566b"
[2024-05-02 17:10:37.184] 0ms [debu] [52009,588682] [5cf525-3bc7-e944-ed02-41700d38356314/043936-bca2-9e48-f4bf-f23683392514d9/plain/cbjs_b_342c566b] <localhost/::1:11210> stop MCBP connection, reason=node_not_available
[2024-05-02 17:10:37.185] 0ms [debu] [52009,588682] Query DNS-SRV: address="localhost", service="_couchbase", nameserver="1.1.1.1:53"
[2024-05-02 17:10:37.185] 0ms [trac] [52009,588682] Query DNS-SRV (UDP) address="1.1.1.1:53", udp_timeout=250ms, total_timeout=500ms
[2024-05-02 17:10:37.185] 0ms [debu] [52009,588682] [5cf525-3bc7-e944-ed02-41700d38356314/043936-bca2-9e48-f4bf-f23683392514d9/plain/cbjs_b_342c566b] <localhost/::1:11210> destroy MCBP connection
[2024-05-02 17:10:37.196] 10ms [debu] [52009,588682] DNS UDP returned 0 records
As you can see, roughly 500ms later, it still retry the pending operations. Is this the expected behavior ? If so, how can I properly abort pending operation without killing the client altogether ?