Couchbase connect timeout Node.js SDK 3.2.1

Hi,
Using Node.js SDK 3.2.1

One of my servers has started sporadically timing out when calling cluster.connect. Today is the first time I have seen this on this server.
To debug this, I have created a test app that calls Couchbase.connect() and reads a known doc from a bucket then closes the connection.
Sometimes the connect call fails with TimeoutError.
Sometimes it takes a long time to connect (I would guess pretty close to the 5 second timeout) then completes but the GET document command fails with a RequestCanceledError, so I guess it didn’t really connect.
Occasionally (maybe 20% of the time?) it works! When it does, the connection is immediate and it then reads the document.
I have never seen the connection take more than a second to complete successfully so I can read the document. If it takes longer than that I get the RequestCanceledError when it tries to read the doc.

If it helps, this is the full error when it times out:

TimeoutError: timeout
    at Object.translateCppError (/snapshot/yoid/node_modules/couchbase/dist/bindingutilities.js:180:20)
    at Object.<anonymous> (/snapshot/yoid/node_modules/couchbase/dist/connection.js:145:45) {
  cause: LibcouchbaseError: libcouchbase error 201
      at Object.translateCppError (/snapshot/yoid/node_modules/couchbase/dist/bindingutilities.js:174:21)
      at Object.<anonymous> (/snapshot/yoid/node_modules/couchbase/dist/connection.js:145:45) {
    code: 201
  },
  context: undefined
}

This is the error when the connect completes but the GET fails.

RequestCanceledError: request canceled
    at Object.translateCppError (/snapshot/yoid/node_modules/couchbase/dist/bindingutilities.js:182:20)
    at /snapshot/yoid/node_modules/couchbase/dist/connection.js:237:54 {
  cause: LibcouchbaseError: libcouchbase error 202
      at Object.translateCppError (/snapshot/yoid/node_modules/couchbase/dist/bindingutilities.js:174:21)
      at /snapshot/yoid/node_modules/couchbase/dist/connection.js:237:54 {
    code: 202
  },
  context: KeyValueErrorContext {
    status_code: 0,
    opaque: 0,
    cas: CbCas { '0': <Buffer 00 00 00 00 00 00 00 00> },
    key: 'sanityCheck',
    bucket: '',
    collection: '',
    scope: '',
    context: '',
    ref: ''
  }
}

If it helps I can post my full test code, but the crux of it is

const cluster = await couchbase.connect("couchbase://localhost", { username, password });
const bucket = cluster.bucket(bucketName);
const coll = bucket.defaultCollection();
const res = await coll.get("sanityCheck");

I output the username and password before this and they are definitely correct. As I said, sometimes the code works with the same username and password.

I have looked in the CB logs but don’t see anything obvious. I’ve run tail -r on each log whilst running the command and no output gets printed whether it passes or fails.

I’ve tried restarting the couchbase service and even rebooted the server. I don’t know what else to try. I have also run the same code on other servers with the same Couchbase setup and it works successfully every time.

Thanks,
Giles

I posted this question on a Friday. I tried again on Monday morning on the same server and it worked perfectly every time. I don’t know what, if anything, changed. There was no difference that I’m aware of. I guess the server was just tired after a long week. I’ve tried the test several times since and it works every time.

If it happens again, is there anything I can check to investigate why the connection times out or the get returns “RequestCanceledError”? What’s the best CB log to check for connection requests? Or is there any verbose logging that I can turn on? @brett19 please would you be able to advise on this?

Thanks,
Giles

Use SDK Doctor ro diagnose SDK connectivity issues.

Thanks. I haven’t heard of SDK Doctor. I’ll give it a try.

This same issue has happened again, on the same server. I have ran sdk doctor multiple times. It does not show any connection issues. I am at a loss to explain why this issue is happening or how I can proceed in investigating. Please can anyone help me to diagnose the “connect” call.

Here is the output from SDK doctor. I have hidden the server (replaced with ) and bucket name as I don’t want to post them on a public forum.

|====================================================================|
|          ___ ___  _  __   ___   ___   ___ _____ ___  ___           |
|         / __|   \| |/ /__|   \ / _ \ / __|_   _/ _ \| _ \          |
|         \__ \ |) | ' <___| |) | (_) | (__  | || (_) |   /          |
|         |___/___/|_|\_\  |___/ \___/ \___| |_| \___/|_|_\          |
|                                                                    |
|====================================================================|

Note: Diagnostics can only provide accurate results when your cluster
 is in a stable state.  Active rebalancing and other cluster configuration
 changes can cause the output of the doctor to be inconsistent or in the
 worst cases, completely incorrect.

13:22:18.515 INFO â–¶ Parsing connection string `couchbase://<host>/<bucket>`
13:22:18.515 INFO â–¶ Connection string was parsed as a potential DNS SRV record
13:22:23.588 INFO â–¶ Connection string identifies the following CCCP endpoints:
13:22:23.588 INFO â–¶   1. <host>:11210
13:22:23.588 INFO â–¶ Connection string identifies the following HTTP endpoints:
13:22:23.588 INFO â–¶   1. <host>:8091
13:22:23.588 INFO â–¶ Connection string specifies bucket `<bucket>`
13:22:28.589 WARN â–¶ Your connection string specifies only a single host.  You should consider adding additional static nodes from your cluster to this list to improve your applications fault-tolerance
13:22:28.589 INFO â–¶ Performing DNS lookup for host `<host>`
13:22:28.589 INFO â–¶ Bootstrap host `<host>` refers to a server with the address `127.0.0.1`
13:22:28.591 INFO â–¶ Attempting to connect to cluster via CCCP
13:22:28.591 INFO â–¶ Attempting to fetch config via cccp from `<host>:11210`
13:22:28.594 INFO â–¶ Selected the following network type: default
13:22:28.594 INFO â–¶ Identified the following nodes:
13:22:28.594 INFO â–¶   [0] <host>
13:22:28.594 INFO â–¶                     mgmt:  8091,                   kv: 11210,                 capi:  8092
13:22:28.594 INFO â–¶                projector:  9999
13:22:28.594 INFO â–¶ Fetching config from `http://<host>:8091`
13:22:28.602 INFO â–¶ Received cluster configuration, nodes list:
[
  {
    "addressFamily": "inet",
    "clusterCompatibility": 393222,
    "clusterMembership": "active",
    "configuredHostname": "<host>:8091",
    "couchApiBase": "http://<host>:8092/",
    "cpuCount": 4,
    "externalListeners": [
      {
        "afamily": "inet",
        "nodeEncryption": false
      },
      {
        "afamily": "inet6",
        "nodeEncryption": false
      }
    ],
    "hostname": "<host>:8091",
    "interestingStats": {
      "cmd_get": 0,
      "couch_docs_actual_disk_size": 11292756899,
      "couch_docs_data_size": 508611014,
      "couch_spatial_data_size": 0,
      "couch_spatial_disk_size": 0,
      "couch_views_actual_disk_size": 2314444309,
      "couch_views_data_size": 1929662855,
      "curr_items": 3824543,
      "curr_items_tot": 3824543,
      "ep_bg_fetched": 0,
      "get_hits": 0,
      "mem_used": 1487864648,
      "ops": 0,
      "vb_active_num_non_resident": 3494996,
      "vb_replica_curr_items": 0
    },
    "mcdMemoryAllocated": 6380,
    "mcdMemoryReserved": 6380,
    "memoryFree": 5711441920,
    "memoryTotal": 8363565056,
    "nodeEncryption": false,
    "nodeUUID": "4a4be41c01ae77019879dbd842487c7d",
    "os": "x86_64-unknown-linux-gnu",
    "otpNode": "ns_1@<host>",
    "ports": {
      "direct": 11210,
      "distTCP": 21100,
      "distTLS": 21150
    },
    "recoveryType": "none",
    "services": [
      "kv"
    ],
    "status": "healthy",
    "systemStats": {
      "allocstall": 0,
      "cpu_cores_available": 4,
      "cpu_stolen_rate": 0,
      "cpu_utilization_rate": 5.445544554455446,
      "mem_free": 5711441920,
      "mem_limit": 8363565056,
      "mem_total": 8363565056,
      "swap_total": 0,
      "swap_used": 0
    },
    "thisNode": true,
    "uptime": "853",
    "version": "6.6.0-7909-community"
  }
]
13:22:28.603 INFO â–¶ Successfully connected to Key Value service at `<host>:11210`
13:22:28.608 INFO â–¶ Successfully connected to Management service at `<host>:8091`
13:22:28.619 INFO â–¶ Successfully connected to Views service at `<host>:8092`
13:22:28.619 WARN â–¶ Could not test Query service on `<host>` as it was not in the config
13:22:28.619 WARN â–¶ Could not test Search service on `<host>` as it was not in the config
13:22:28.619 WARN â–¶ Could not test Analytics service on `<host>` as it was not in the config
13:22:28.620 INFO â–¶ Memd Nop Pinged `<host>:11210` 10 times, 0 errors, 0ms min, 0ms max, 0ms mean
13:22:28.620 INFO â–¶ Diagnostics completed

Summary:
[WARN] Your connection string specifies only a single host.  You should consider adding additional static nodes from your cluster to this list to improve your applications fault-tolerance
[WARN] Could not test Query service on `<host>` as it was not in the config
[WARN] Could not test Search service on `<host>` as it was not in the config
[WARN] Could not test Analytics service on `<host>` as it was not in the config

Found multiple issues, see listing above.

We have finally found the problem. It was due to some DNS network issues that we’ve now resolved. Working fine now and hopefully I won’t see it again.

Hi Giles, mind sharing whatever information you can regarding the DNS issue? We’re running into what appears to be the same thing. Our Node.js app runs in a Docker container running in AWS ECS, and we’re wondering if we have to do the same DNS resolution fix that we previously had to do for a similar issue regarding Elasticsearch.

Hi @smartin ,
I’m not actually sure it was related to DNS but it looked that way at the time. We made some changes to resolve.conf which related to internal traffix to localhost. These seemed to fix it.
However, later the issue came back so I’m not sure it was related. That server had been upgraded from Ubuntu 16 to Ubuntu 18 and we think that there might have been some old files or settings from Ubuntu 16 which was messing it up.
The way we finally fixed the issue was by making a clean Ubuntu 18 install rather than an upgrade from 16. We haven’t seen the problem since we did that.

Sorry I can’t help any more than that.

1 Like