[CCBC-170] After upgrading to 2.0.2 with CBC163, I now get timesouts again!!! Created: 22/Jan/13  Updated: 27/Mar/14  Resolved: 27/Mar/14

Status: Resolved
Project: Couchbase C client library libcouchbase
Component/s: library
Affects Version/s: 2.0.2
Fix Version/s: 2.2.0
Security Level: Public

Type: Bug Priority: Major
Reporter: Michael Leib Assignee: Mark Nunberg
Resolution: Incomplete Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: CentsOS 5.X, 64 Bit, libev version

I was using 2.0.0 beta3 with the patch for TIMEOUTS manually applied....this was working fine but I had high CPU. This ran for weeks (if not a month) without any
timeouts at all...I was the happiest with lcb as I had been in a long time. However, after getting some complaints from my server admins and NOC, I upgraded to
2.0.2 with the fix for high CPU...

I rebuilt all modules and rolled to production. It's now been about 4 hours and I have received 2 timeouts as to where I had NONE for weeks without this change.

The load is now normal though but somehow something has broken because of this fix.

I looked at the change with CBC-163, but I don't see how this could have broken the code...but, it has - I promise. Can someone please take a look?


Comment by Michael Leib [ 06/Feb/13 ]
I have changed my code around so that I have an ev_timer throttle the millions of lcb_get() requests that I need to make over multiple request chunks from a queue rather than doing it all in a single loop and then letting libev handle it all and that appears to be working. There is still be a problem with the latest code if you do millions of requests within a function call before control is returned to the libev event loop. This doesn't happen everytime, but it does happen frequently enough.

Keeping the number of lcb_get() calls sitting on the event stack low makes a BIG difference.

Once again, with the version that didn't have the high-load fix implemented worked great (except for the load)...when the fix was applied, large amounts of pending events cause TIMEOUT returns from lcb_get(),
Comment by Michael Leib [ 06/Feb/13 ]
And, of course, event with an ev_timer throttle of .01 secs, performance is about 1/4 of what it was when I just threw everything on the queue -
basically about 5k/sec throughput vs 20k/sec just focusing on lcb_get() returns. My CB Cluster (2 servers) is reporting 50K ops/sec consistently
without issue.
Comment by Mark Nunberg [ 03/Feb/14 ]
I'm going to assume most timeout issues were fixed in 2.2.0 and will close this.
Generated at Fri Nov 21 12:15:36 CST 2014 using JIRA 5.2.4#845-sha1:c9f4cc41abe72fb236945343a1f485c2c844dac9.