[JCBC-220] Spymemcached doesn't flush the queues correctly during bulk loads Created: 25/Jan/13  Updated: 28/Jan/13  Resolved: 25/Jan/13

Status: Resolved
Project: Couchbase Java Client
Component/s: Core
Affects Version/s: 1.1.1
Fix Version/s: None
Security Level: Public

Type: Bug Priority: Major
Reporter: Muthu Kumar Assignee: Michael Nitschinger
Resolution: Cannot Reproduce Votes: 0
Labels: 1.1.1, bulk, clients, issue, java, memcached, queueing, sets
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File 2.0-couchbase.png     Java Source File Couch1234.java    

Library: 1.1.1
There is data loss in couchbase 2.0 when using the set command and the couchbase bucket . Loss seem to be severe the longer away the servers are from the client. Same java client works well with memcached buckets in 2.0, and both couchbase and memcached buckets in 1.8.1. See screenshots below. Note the item count in the couchbase bucket which is missing 24% of the data.
Attached Image for the total items stored in the couchbase bucket. Only 750K items stored for 1M inserts.

On bulk loads using the 1.1.1 library, the customer is seeing data loss for the items that have been set.

The customer tried to set 1M items using the latest Java Client 1.1.1 and figured out that all the items are getting persisted.

An update from the customer.......

I have rewritten it a bit and reproduced the problem here. Find the updated version enclosed where you can see the issue being reproduced. You will see that the number of keys reported by couchbase is not the number of keys that we have inserted.

It seems that the problem is in the handling of queueing the set calls internally in the driver. I.e, if we don't actively force the "async" queues to flush (by calling the future get()), data on the queues could be discarded. So this sounds like a spymemcached bug where it does not correctly flush the queues during high loads? According to the javadoc we should have seen the below, and if not, we should have assumed that all operations were properly processed?

java.lang.IllegalStateException - in the rare circumstance where queue is too full to accept any more requests

Attache code using which we were able to reproduce this error on bulk loads.

Comment by Muthu Kumar [ 25/Jan/13 ]
The customer is also interested to know as below.

Just a side note - could it be that the:
net.spy.memcached.DefaultConnectionFactory# createOperationQueue
need to be configured differently? From what I can see spymemcached provides two different operation queues, and it seems that it should either block the add of the async call or just let the queue keep growing (array versus linked queue).
Looking forward to engineering response.
Comment by Muthu Kumar [ 25/Jan/13 ]
No Michael - Can I close this and raise a CBSE ?
Comment by Michael Nitschinger [ 25/Jan/13 ]
Yes please!
Comment by Matt Ingenthron [ 25/Jan/13 ]
I'm sorry to say, the test is wrong. The client's shutdown method is never called, and that would allow the IO thread to complete work before shutting down.

The get() method on the OperationFuture never does flush a queue. You're just killing the IO thread with a System.exit(0) from the main thread before the IO thread gets to complete its work.
Comment by Muthu Kumar [ 28/Jan/13 ]
Thanks Michael and Matt - I have updated the ticket with your comments and will raise a CBSE if the customer comes back with an issue.
Comment by Muthu Kumar [ 28/Jan/13 ]
Raised this http://www.couchbase.com/issues/browse/CBSE-366

Generated at Tue Nov 25 18:24:07 CST 2014 using JIRA 5.2.4#845-sha1:c9f4cc41abe72fb236945343a1f485c2c844dac9.