Timeouts on a couchbase cluster using gevent on simple get calls

I’m using the latest python SDK (1.2) with gevent and I’m getting timeout errors on simple get operations. The cluster has 3 couchbase nodes hosted on Amazon and the client runs locally on one of them. When the client is restarted the errors no longer appear.
The exception I get is:

Traceback (most recent call last):
  File "/home/cep/cep/wamp/cra.py", line 67, in rpc_call
    result = self.procedures.call(uri, args)
  File "/usr/lib/python2.7/site-packages/geventwebsocket/protocols/wamp.py", line 62, in call
    return proc[1](*args)
  File "/home/cep/cep/users.py", line 180, in linkFacebookByAccessToken
    user = User(response['email'])
  File "/home/cep/cep/couchbasekit/document.py", line 55, in __init__
    if self._fetch_data(get_lock) is False:
  File "/home/cep/cep/couchbasekit/document.py", line 172, in _fetch_data
    result = self.bucket.get(self.doc_id)
  File "/usr/lib64/python2.7/site-packages/gcouchbase/connection.py", line 106, in ret
    return self._waitwrap(meth(self, *args, **kwargs))
  File "/usr/lib64/python2.7/site-packages/gcouchbase/connection.py", line 102, in _waitwrap
    return get_hub().switch()
  File "/usr/lib64/python2.7/site-packages/gevent/hub.py", line 331, in switch
    return greenlet.switch(self)
TimeoutError: 

Hi

How often are you getting these timeouts and how long does it take to time out? If you use the same code without gevent, how does it function?

Hi Mark,

I get these timeouts almost once every 24hrs. Once the client goes into this state every db operation timeouts. If I restart the client seems to be stable again but only until the next incident. Just a small note. We are using m1.medium Amazon instances (I do not know if it has to do with couchbase minimum requirements). Haven’t being able to test without gevent yet.

I can’t say I know yet what might be the cause of this issue, so we’ll need to narrow down the triggers for this:

(1) If you restart the server instead of the client, how does it respond?
(2) Is there any notable correlation between memory and/or CPU usage once you get these timeouts
(3) Do you notice any particular delay when the timeout takes place? (instant? 2.5 seconds? etc.)

Hi Mark,

I’m still waiting for the next incident to happen in order to provide more feedback. This has not happened again. One thing I can tell you for sure is that the CPU usage is pretty low. I’ll keep you posted.

Thanks