App crash when connecting while a bucket is being created

JesusTheHun · May 2, 2024, 3:30pm

@mreiche I have finally find the issue.

When you do cluster.bucket('foo'), it opens the bucket behind the scene, by calling the C++ bindings.

Because this operation is not awaited (see code), it is possible that you try to perform an operation towards a bucket that is not opened yet.

When that happens, the C++ client to queue your operation and retry every 500ms.
If you do an operation after the bucket has been opened but before those 500ms, this operation will go through immediately, not waiting the next 500ms tick.

During the tests, the bucket will be created, opened (opening would fail), insert would be tried and succeed, therefore the tests ends, triggering the destruction of the bucket. Then the 500ms tick kicks in, retries to open the bucket, which does not exist anymore, triggers an error.

When running a single test this does not happen, because the connection is closed, aborting pending operations.

This explains why adding a 200ms sleep did not solve the issue but adding a 500ms solved it : the connection was delayed by 500ms, so the test would not complete before the retry, therefore the retry would succeed before the test ends & the bucket was destroyed.

What I found interesting is that, after each test, the connection is closed. So why the pending operations would be retried ?
Reading the couchbase log I see :

[2024-05-02 17:10:36.793]    0ms [debu] [52009,588682] [5cf525-3bc7-e944-ed02-41700d38356314/98313a-fa22-2748-7b7b-38b38cf3dd7aa3/plain/-] <localhost/::1:11210> stop MCBP connection, reason=do_not_retry
[2024-05-02 17:10:36.793]    0ms [debu] [52009,588682] [5cf525-3bc7-e944-ed02-41700d38356314/98313a-fa22-2748-7b7b-38b38cf3dd7aa3/plain/-] <localhost/::1:11210> destroy MCBP connection

... later ...

[2024-05-02 17:10:37.184]    0ms [debu] [52009,588682] [5cf525-3bc7-e944-ed02-41700d38356314/043936-bca2-9e48-f4bf-f23683392514d9/plain/cbjs_b_342c566b] <localhost/::1:11210> unable to select bucket: cbjs_b_342c566b, probably the bucket does not exist
[2024-05-02 17:10:37.184]    0ms [debu] [52009,588682] all nodes failed to bootstrap, triggering DNS-SRV refresh, ec=bucket_not_found (10), last endpoint="localhost:11210"
[2024-05-02 17:10:37.184]    0ms [warn] [52009,588682] [5cf525-3bc7-e944-ed02-41700d38356314/043936-bca2-9e48-f4bf-f23683392514d9/plain/cbjs_b_342c566b] <localhost/::1:11210> failed to bootstrap session ec=bucket_not_found (10), bucket="cbjs_b_342c566b"
[2024-05-02 17:10:37.184]    0ms [debu] [52009,588682] [5cf525-3bc7-e944-ed02-41700d38356314/043936-bca2-9e48-f4bf-f23683392514d9/plain/cbjs_b_342c566b] <localhost/::1:11210> stop MCBP connection, reason=node_not_available
[2024-05-02 17:10:37.185]    0ms [debu] [52009,588682] Query DNS-SRV: address="localhost", service="_couchbase", nameserver="1.1.1.1:53"
[2024-05-02 17:10:37.185]    0ms [trac] [52009,588682] Query DNS-SRV (UDP) address="1.1.1.1:53", udp_timeout=250ms, total_timeout=500ms
[2024-05-02 17:10:37.185]    0ms [debu] [52009,588682] [5cf525-3bc7-e944-ed02-41700d38356314/043936-bca2-9e48-f4bf-f23683392514d9/plain/cbjs_b_342c566b] <localhost/::1:11210> destroy MCBP connection
[2024-05-02 17:10:37.196]   10ms [debu] [52009,588682] DNS UDP returned 0 records

As you can see, roughly 500ms later, it still retry the pending operations. Is this the expected behavior ? If so, how can I properly abort pending operation without killing the client altogether ?

Topic		Replies	Views
Segfault using threads Node.js SDK	9	497	July 28, 2024
Parent cluster object has been closed Node.js SDK	62	6438	December 15, 2023
Couchbase connection issues Node.js SDK connections	12	3854	April 17, 2020
Not able to connect to cluster anymore suddenly Node.js SDK connections	29	5529	February 21, 2022
SDK 3.0.7 BucketManager.getBucket(): "You must have one open bucket before you can perform queries." Node.js SDK	5	2004	December 7, 2020

App crash when connecting while a bucket is being created

Related topics