Couchbase python sdk connection issues from aks cluster

We have a high availability environment with three couchbase nodes (not in aks) in three different regions. We have two aks clusters with API’s that connect to couchbase in two of the three regions w/ the couchbase nodes. We are noticing intermittent connection issues when performing collection.upsert, collection.replace, and collection.remove queries. N1QL SELECTS are working just fine. We are receiving the following error:

couchbase.exceptions.TimeoutException: <RC=0xC9[LCB_ERR_TIMEOUT (201)], HTTP Request failed.

We are having a hard time figuring out this issue is it only occurs “some of the time”. Any input is appreciated.

Hi @dsbv – Could you provide some logs from the SDK surrounding the event? Please see logging docs here.

Also, some further information/context about the system would be beneficial:

  • Couchbase Server version
  • OS/Platform Python client is running on
  • Version of the Python client

Another tool to help with shedding light on connections is SDK doctor (docs here).

We were able to resolve our issues. Thanks!

1 Like

Hi @dsbv can you share what was the resolution, this will help others as well. Thank you :slight_smile:

Certainly! At some point an assumption was made that since we have two aks clusters in different regions then the deployment yaml files for our python API should set the hostAliases IPs for each couchbase server to the same IP of the couchbase server in vnet in which that aks cluster resides. Once we reverted back and set the correct IPs for each couchbase server in the hostAliases definition, it started working.

Note that we also updated the couchbase python sdk to 3.2 and refactored the code to use connection pooling, but our issues didnt completely resolve until after the hostAliases update in the kubernetes deployment yaml file for our python API that accesses our couchbase cluster.