[SPY-89] invalid binary protocol server response during rebalance Created: 31/May/12  Updated: 29/May/13

Status: Open
Project: Spymemcached Java Client
Component/s: library
Affects Version/s: 2.8.1
Fix Version/s: .next
Security Level: Public

Type: Bug Priority: Major
Reporter: Matt Ingenthron Assignee: Matt Ingenthron
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: Text File rebalanceout-notes.txt     File rebalanceout.out    

 Description   
When investigating an issue, I found that the server would unexpectedly respond with a 0x0a command for optimized get requests. This seems to be in violation of the protocol, and it causes all kinds of havoc for a client library.

The scenario is spymemcached's internals doing an optimized get. It's reading the response for an operation during a rebalance. It has been observed with both the node being rebalanced into the cluster and with a node leaving the cluster.

Reading the minimum header, the client library is checking the magic, and then the response command. It's expecting a 0x00, but receiving a 0x0a.

I poked through the server side code, and 0x0a corresponds to TAP_NOOP. I poked around a little further, and I don't see any situation where we would respond with this 0x0a to a non-TAP client.

The assertion this is raising is right here:
https://github.com/dustin/java-memcached-client/blob/master/src/main/java/net/spy/memcached/protocol/binary/OperationImpl.java#L136

If I force spymemcached to not optimize, the problem goes away.

I will attach a packet capture.

 Comments   
Comment by Matt Ingenthron [ 31/May/12 ]
Added packet capture and notes (which are effectively the same as above)
Comment by Trond Norbye [ 31/May/12 ]
0x0a == NOOP, not TAP_NOOP...
Comment by Trond Norbye [ 31/May/12 ]
I moved this bug back to the client for investigation if we did send a noop request or not (this is typically sent after a series of quiet commands in order to ensure that we get a response). Could it be that we don't correctly handle reshuffling of the packets (adding a noop to all of the new queues, and ensure that we don't send multiple of them since we most likely expect only once).
Comment by Matt Ingenthron [ 01/Jun/12 ]
Moved this, as I found the protocol was correct, but the assertion was incorrect. The assertion did not account for any setq responses with errors, so this would have been an issue in any environment with optimized sets and responses on setqs.
Comment by Michael Nitschinger [ 29/May/13 ]
Is this still an issue that needs to be investigated?
Generated at Tue Sep 02 12:06:07 CDT 2014 using JIRA 5.2.4#845-sha1:c9f4cc41abe72fb236945343a1f485c2c844dac9.