Missing keys during server re-add and re-balance
Hi all,
As part of my testing against 1.7.1 I encountered the following scenario while performing a rebalance operation. Basically, keys that do in fact exist appear to be sporadically go missing and reappear while performing a rebalance.
Scenario: 3 nodes, 2 replicas, 512MB cache per node
1) Remove a node
2) When it is in a Pending removal state, click Rebalance, the cluster will rebalance and that node will disappear
3) Now re-add that node and click Rebalance.
4) Start a read test.
Checkpoint: 1 (in 0s, delta 0s, disk reads 0)
Key '62' is missing!
Key '71' is missing!
Checkpoint: 5000 (in 9.74332618713379s, delta 9.74332618713379s, disk reads 0)
Hmm...
5) Run read test again.
Key '13' is missing!
Key '53' is missing!
Checkpoint: 5000 (in 10.0133807659149s, delta 10.0133807659149s, disk reads 0)
6) Click Manage -> Server Nodes in the GUI. If you're unlucky, the Rebalance will be hung. In this case:
Node 1: 100% complete
Node 2: 85.3% complete
Node 3: 97% complete
Eventually it will clear up.
---
Same deal during a forced outage/failover. When rebalancing is happening, keys go missing (ie. a Memcached::NotFound exception is thrown):
Checkpoint: 5000 (in 6.80381846427917s, delta 6.80381846427917s, disk reads 0)
Key '7979' is missing!
Key '8697' is missing!
...
Key '51' is missing!
Checkpoint: 5000 (in 10.4807422161102s, delta 10.4807422161102s, disk reads 0)
...
Key '10' is missing!
Key '51' is missing!
---
Has anyone seen this behavior before?
Thanks in advance!
- Matt
Thanks for your response Perry.
I'm using the official Ruby client. Moxi is on the server-side. I'm using the default install via RPM for 1.7.1 (and I reproduced in 1.7.0 as well).
When the rebalance is complete, the keys seem to be accessible as expected (nothing missing).
Thanks for that. Can you confirm the message you are getting back for those keys? Is it "NOT_FOUND" or some other error?
Hi Perry,
Here's the snippet of code from my Ruby test script:
42 begin 43 value = m.get key 44 if (actual_sha1 = Digest::SHA1.hexdigest(value)) != expected_sha1 45 puts "Key '#{key}' value mismatch: #{actual_sha1} / #{expected_sha1}" 46 end 47 rescue Memcached::NotFound => exception 48 puts "Key '#{key}' is missing!" 49 end
The exception being thrown is Memcached::NotFound.
I came across this bug report:
http://www.couchbase.org/issues/browse/MB-4167
I think I am encountering this exact issue. Note how the author of that issue experiences exactly two "missing keys" during a rebalance. In my scenario, every time I reproduce it, there are also exactly two missing keys.
Yes, it does seem like the same issue. We've identified the problem and will be releasing an updated version with this soon.
Perry
Matt, that seems pretty strange. Definitely something we test for, so I'd like to find out what is specific to your environment.
What client library are you using? Are you going through Moxi on the server side or the client side?
When the rebalance is complete, are keys still missing or does the read test succeed?
Perry
Forum support is great for free but sometimes you need a guaranteed response time and dedicated resources for your questions or issues.
Consider purchasing enterprise-level support from Couchbase: http://www.couchbase.com/products-and-services/overview
Call or email "sales -at- couchbase-dot- com" today!