[SPY-89] invalid binary protocol server response during rebalance Created: 31/May/12 Updated: 01/Jun/12 |
|
| Status: | Open |
| Project: | Spymemcached Java Client |
| Component/s: | library |
| Affects Version/s: | 2.8.1 |
| Fix Version/s: | None |
| Security Level: | Public |
| Type: | Bug | Priority: | Major |
| Reporter: | Matt Ingenthron | Assignee: | Matt Ingenthron |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
| Description |
|
When investigating an issue, I found that the server would unexpectedly respond with a 0x0a command for optimized get requests. This seems to be in violation of the protocol, and it causes all kinds of havoc for a client library.
The scenario is spymemcached's internals doing an optimized get. It's reading the response for an operation during a rebalance. It has been observed with both the node being rebalanced into the cluster and with a node leaving the cluster. Reading the minimum header, the client library is checking the magic, and then the response command. It's expecting a 0x00, but receiving a 0x0a. I poked through the server side code, and 0x0a corresponds to TAP_NOOP. I poked around a little further, and I don't see any situation where we would respond with this 0x0a to a non-TAP client. The assertion this is raising is right here: https://github.com/dustin/java-memcached-client/blob/master/src/main/java/net/spy/memcached/protocol/binary/OperationImpl.java#L136 If I force spymemcached to not optimize, the problem goes away. I will attach a packet capture. |
| Comments |
| Comment by Matt Ingenthron [ 31/May/12 ] |
| Added packet capture and notes (which are effectively the same as above) |
| Comment by Trond Norbye [ 31/May/12 ] |
| 0x0a == NOOP, not TAP_NOOP... |
| Comment by Trond Norbye [ 31/May/12 ] |
| I moved this bug back to the client for investigation if we did send a noop request or not (this is typically sent after a series of quiet commands in order to ensure that we get a response). Could it be that we don't correctly handle reshuffling of the packets (adding a noop to all of the new queues, and ensure that we don't send multiple of them since we most likely expect only once). |
| Comment by Matt Ingenthron [ 01/Jun/12 ] |
| Moved this, as I found the protocol was correct, but the assertion was incorrect. The assertion did not account for any setq responses with errors, so this would have been an issue in any environment with optimized sets and responses on setqs. |