Frequent TimeoutException from Python 3.x SDK

Hi there,

We got very frequent TimeoutException on Flask application, according to the log below, it just threw timeout but actually from the cb.get sent till it fails, it didn’t exceed the KVtimeout threshold yet.

2021-04-10 02:39:52,589 - app.modules.db_helper - INFO - key to fetch in get() of CouchbaseCluster [64c023fd-e4c4-4136-85cd-f4c6cdc15ded_schema]
2021-04-10 02:39:52,589 - app.modules.db_helper - ERROR - try again to get 64c023fd-e4c4-4136-85cd-f4c6cdc15ded_schema due to exception
2021-04-10 02:39:52,668 - app.modules.db_helper - ERROR - <Key=‘64c023fd-e4c4-4136-85cd-f4c6cdc15ded_schema’, RC=0xC9[LCB_ERR_TIMEOUT (201)], Operational Error, Results=1, C Source=(src/multiresult.c,316), Context={‘status_code’: 4, ‘opaque’: 6, ‘cas’: 0, ‘key’: ‘64c023fd-e4c4-4136-85cd-f4c6cdc15ded_schema’, ‘bucket’: ‘workflows’, ‘collection’: ‘’, ‘scope’: ‘’, ‘context’: ‘’, ‘ref’: ‘’, ‘endpoint’: ‘10.244..:11210’, ‘type’: ‘KVErrorContext’}, Tracing Output={“64c023fd-e4c4-4136-85cd-f4c6cdc15ded_schema”: {“s”: “kv:Unknown”, “i”: 14219769572458419854, “b”: “workflows”, “r”: “10.244..:11210”, “t”: 30000000, “debug_info”: {“FILE”: “src/callbacks.c”, “FUNC”: “value_callback”, “LINE”: 852}}}>
Traceback (most recent call last):
File “/var/myapp/dbaas/app/modules/db_helper.py”, line 418, in get
rv = self.collection.get(key, quiet=True)
File “/var/myapp/dbaas/venv/lib/python3.7/site-packages/couchbase/collection.py”, line 258, in wrapped
return func(self, args, **kwargs)
File “/var/myapp/dbaas/venv/lib/python3.7/site-packages/couchbase/result.py”, line 471, in wrapped
x, options = func(args, **kwargs)
File “/var/myapp/dbaas/venv/lib/python3.7/site-packages/couchbase/collection.py”, line 479, in get
return self._get_generic(key, kwargs, options)
File “/var/myapp/dbaas/venv/lib/python3.7/site-packages/couchbase/collection.py”, line 443, in _get_generic
x = CoreClient.get(self.bucket, key, **opts)
File “/var/myapp/dbaas/venv/lib/python3.7/site-packages/couchbase_core/client.py”, line 406, in get
return super(Client, self).get(args,**kwargs)
couchbase.exceptions.TimeoutException: <Key=‘64c023fd-e4c4-4136-85cd-f4c6cdc15ded_schema’, RC=0xC9[LCB_ERR_TIMEOUT (201)], Operational Error, Results=1, C Source=(src/multiresult.c,316), Context={‘status_code’: 4, ‘opaque’: 6, ‘cas’: 0, ‘key’: ‘64c023fd-e4c4-4136-85cd-f4c6cdc15ded_schema’, ‘bucket’: ‘workflows’, ‘collection’: ‘’, ‘scope’: ‘’, ‘context’: ‘’, ‘ref’: ‘’, ‘endpoint’: '10.244.
.
:11210’, ‘type’: ‘KVErrorContext’}, Tracing Output={“64c023fd-e4c4-4136-85cd-f4c6cdc15ded_schema”: {“s”: “kv:Unknown”, “i”: 14219769572458419854, “b”: “workflows”, “r”: "10.244.
.*:11210", “t”: 30000000, “debug_info”: {“FILE”: “src/callbacks.c”, “FUNC”: “value_callback”, “LINE”: 852}}}>
2021-04-10 02:39:52,732 - app.modules.db_helper - INFO - Item [64c023fd-e4c4-4136-85cd-f4c6cdc15ded_schema] is found
2021-04-10 02:39:52,733 - app.modules.db_helper - INFO - value to return in get(64c023fd-e4c4-4136-85cd-f4c6cdc15ded_schema) of CouchbaseCluster is not empty

and here is the code for connections:

           logger.info(
                "Attempting to connect Couchbase: {}, on bucket: {}".format(self._db_hosts, self._bucket_name))

            if current_app.config['DB_PASS_ENCRYPTED']:
                self._my_password = DBFernetHelper().decrypt(db_pass).decode('utf-8')
            else:
                self._my_password = db_pass

            self._cluster = Cluster.connect(
                'couchbase://{}?query_timeout=75.0&operation_timeout=30.0&views_timeout=75.0'.format(
                    self._db_hosts),
                ClusterOptions(
                    PasswordAuthenticator(self._username, self._my_password),
                    lockmode=LOCKMODE_WAIT,
                    timeout_options=ClusterTimeoutOptions(query_timeout=timedelta(seconds=75),
                                                          kv_timeout=timedelta(seconds=30),
                                                          views_timeout=timedelta(seconds=75),
                                                          # config_total_timeout=timedelta(seconds=75)
                                                          )
                ), lockmode=LOCKMODE_WAIT
            )
            # self._bucket = self._cluster.bucket(self._bucket_name)
            self._bucket = self._cluster._cluster.open_bucket(self._bucket_name,
                                                              lockmode=LOCKMODE_WAIT)  # work around to set lockmode
            self._collection = self._bucket.default_collection()

Does anyone else experience the similar issue on Python SDK on Flask application? It’s really frustrating to see timeout but cannot find the root cause. :cry:

this is the get method:

def get(self, key):
    logger.info("key to fetch in get() of CouchbaseCluster [%s] " % key)
    val = None
    try:
        rv = self.collection.get(key, quiet=True)
        if rv.success:
            val = rv.content
            logger.info("Item [%s] is found" % key)
        else:
            logger.info("Item [%s] does not exist" % key)
    except CouchbaseInternalException as e:
        try:
            rv = self.collection.get(key, quiet=True, replica=True)
            if rv.success:
                val = rv.content
                logger.info("Item [%s] is found in replica" % key)
            else:
                logger.info("Item [%s] does not exist in replica" % key)
        except (CouchbaseException, Exception) as e:
            logger.exception(e)
            raise
    except (TimeoutException, CouchbaseException, Exception) as e:
        logger.error("try again to get {} due to exception".format(key))
        logger.exception(e)
        try:
            rv = self.collection.get(key, quiet=True)
            if rv.success:
                val = rv.content
                logger.info("Item [%s] is found" % key)
            else:
                logger.info("Item [%s] does not exist" % key)
        except (CouchbaseException, Exception) as e:
            logger.error("2nd get {} failed again due to exception".format(key))
            logger.exception(e)
            raise
    finally:
        if val:
            logger.info("value to return in get({}) of CouchbaseCluster is not empty".format(key))
        else:
            logger.info("value to return in get({}) of CouchbaseCluster is empty".format(key))

        return val

although the exception message says the t(timeout) is 30000000, I don’t think the cmd actually took 30s long, that means the kvtimeout setting doesn’t take effective.

@jcasey / @davidkelly can you help ?

1 Like

I think I figure out what’s the problem.

My connection pool was not actually singleton, after implementing a thread-safe singleton class to ensure there’s only one couchbase connection instance, the timeout values seems working.

It seems still happens after applying the singleton on connection instance, but timeout setting seems working.

is there any way to test current cluster connection and if it’s closed, and do a reconnect?
cannot find any doc for Couchbase reconnect…