Couchbase connection closed intermittently - Node.js SDK

We are using Couchbase Node.js SDK 3.0.1 against Couchbase 6.5 server.

Almost every developer has run into sporadic parent cluster objects has been closed errors on and off. During the initial startup of our express/node.js api server, this happens 100% of the time locally, no matter how long the couchbase server has been up. A restart of the api server, which uses couchbase node.js sdk and express, always seems to work.

However, we also see sporadic connection issues while hitting the API server, particularly if it’s in a transition state of starting up.

This gets the node.js server in a state where it’s unrecoverable.

Is there a way to recover gracefully in something like the code below where we are doing a sub-document mutate operation? All we are doing are using standard collections based apis according to the documentation.

Error: parent cluster object has been closed
at Connection._maybeBFwd (C:\Dev\JLLISNEX\jllis-api\node_modules\couchbase\lib\connection.js:181:13)
at Connection.mutateIn (C:\Dev\JLLISNEX\jllis-api\node_modules\couchbase\lib\connection.js:252:10)
at C:\Dev\JLLISNEX\jllis-api\node_modules\couchbase\lib\collection.js:1094:25
at C:\Dev\JLLISNEX\jllis-api\node_modules\couchbase\lib\promisehelper.js:30:7
at new Promise ()
at Function.wrap (C:\Dev\JLLISNEX\jllis-api\node_modules\couchbase\lib\promisehelper.js:29:12)
at Collection.mutateIn (C:\Dev\JLLISNEX\jllis-api\node_modules\couchbase\lib\collection.js:1082:26)
at C:\Dev\JLLISNEX\jllis-api\src\api\dps\route.ts:171:31
at Layer.handle [as handle_request] (C:\Dev\JLLISNEX\jllis-api\node_modules\express\lib\router\layer.js:95:5)
at next (C:\Dev\JLLISNEX\jllis-api\node_modules\express\lib\router\route.js:137:13)
[2020-04-29T10:21:36.959] [ERROR] jllis-api - ::ffff:127.0.0.1 - - “PATCH /dataproducts/AAR-2/details HTTP/1.1” 500 81 “” “PostmanRuntime/7.24.0”

Hey @fredwang00,
Just to follow up on this particular post, you can track the underlying issue which is causing this and its resolution here:
https://issues.couchbase.com/browse/JSCBC-706
Cheers, Brett

Thanks @brett19 for the follow up.

I read through the description of the JIRA issue and just wanted to clarify. We instantiate the bucket object using the cluster.bucket(’’) api in the express route code of our app. This typically happens when we want to access or mutate some collection in a known bucket, which to my understanding, should create the bucket if it doesn’t exist.

Most of the time, it should just access the existing bucket via a index in an array, but somehow the connection either never happened during the instantiation or was severed.

So what we need to understand is how to retry on any bucket where the underlying connection is closed, or should the connection attempt respect a timeout that we give and retry a couple of times.

If something like this already exists, it’s not clear how we control the connection and reconnect.

Thanks.

Hey @fredwang00,

You’re description is helpful, and I can tell you that your usage is correct, the issue is that there is currently a bug in the SDK where connections which fail at connect time are not automatically being retried, but instead cause that bucket to be marked as closed and inaccessible. The bug I linked to is tracking the fix for that particular bug and will enable you to use the code you have as it stands today. Unfortunately, due to the fact that the SDK interfaces were designed such that the user shouldn’t need to worry about the underlying bucket connections, the only way to force the SDK to reconnect a particular bucket is to create an entirely new Cluster object. One possible workaround right now would be to utilize a Cluster object per bucket you want to open, and then if you encounter errors when performing operations you can create a new Cluster object for that bucket.

P.S. Note that this particular issue only occurs when the the bucket doesn’t exist, or the cluster is offline when you call initially create the Cluster object and open the Bucket.

Cheers, Brett

1 Like

Hi Brett,
I see this is still unresolved. Why not simply reintroduce the connection event like in previous sdk?
I get that you want to make it effortless, but it’s not nice to have the database as a black box and being unable to make decision based on the its status.
I am not asking to change the connection method, but simply have a way to get notified of the connection so that we can do our own things.

Hey @00christian00,

Unfortunately the interface between the application and SDK, and between the SDK and its underlying IO system (libcouchbase) was designed around the concepts of SDK 3, where the Cluster object acts as the overarching point of resource ownership. We have been waiting for some underlying changes to the IO system which we would be able to take advantage of to provide the correct behaviour in the Node.js SDK, but I admit that it has taken substantially longer than we had hoped.

Note that, to your point, it is possible to infer the status of a connection through the Error’s thrown from operations. That is to say that if you perform an operation and receive an error back indicating that the connection is closed but you are certain that the issue is transient, you can simply instantiate a new Cluster object. In the near future this reconnection will no longer be necessary, as the SDK will internally perform the reconnect automatically, and will additionally provide additional context to the error that is throw from the operation, indicating the reason that the connection was lost.

Cheers, Brett

Thanks Brett. The new sdk error reporting is really lacking at the moment.
Spent half a day trying to understand why it won’t connect anymore and it was the user credential wrong.
It would just says “the parent cluster has been closed” whenever I would try to access a collection although the connection was established well.
I am trying to convert the script that initialize the database for testing to the new sdk and I didn’t realize the user was created with incomplete rights.
By the way please update the documentation in regards to cluster management, cause understanding everything just from the API reference is a nightmare and the new API in this regards seem quite convoluted(before there was a simple cluster manager class that did most of it).

Hi,

I think it might be the same issue.

We cannot set up the connection timeout at the moment (SDK 3.1.1)