We have a high availability environment with three couchbase nodes (not in aks) in three different regions. We have two aks clusters with API’s that connect to couchbase in two of the three regions w/ the couchbase nodes. We are noticing intermittent connection issues when performing collection.upsert, collection.replace, and collection.remove queries. N1QL SELECTS are working just fine. We are receiving the following error:
Certainly! At some point an assumption was made that since we have two aks clusters in different regions then the deployment yaml files for our python API should set the hostAliases IPs for each couchbase server to the same IP of the couchbase server in vnet in which that aks cluster resides. Once we reverted back and set the correct IPs for each couchbase server in the hostAliases definition, it started working.
Note that we also updated the couchbase python sdk to 3.2 and refactored the code to use connection pooling, but our issues didnt completely resolve until after the hostAliases update in the kubernetes deployment yaml file for our python API that accesses our couchbase cluster.