[MB-4518] deleting items during rebalance results in desynced view Created: 07/Dec/11  Updated: 06/Feb/12  Resolved: 26/Jan/12

Status: Closed
Project: Couchbase Server
Component/s: view-engine
Affects Version/s: 2.0-developer-preview-3
Fix Version/s: 2.0-developer-preview-4
Security Level: Public

Type: Bug Priority: Blocker
Reporter: Keith Batten (Inactive) Assignee: damien
Resolution: Duplicate Votes: 0
Labels: 2.0-DP3-release-notes
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: centos 5.4 64 bit
3 nodes
r-378

Attachments: File logs_deletion.tar.bz2     Zip Archive logs.zip    

 Description   
1) cluster 2 nodes together
2) add 100 json items
3) create a simple view with _count reduce
4) verify that all items are returned in the view
5) start to rebalance in a 3rd node, while at the same time start deleting all the keys
6) wait till both rebalance and deletes are done
7) verify that all items are deleted from memcached
8) verify that all items are gone from the view
at this point in my test I had 7 items left

 Comments   
Comment by Aliaksey Artamonau [ 23/Dec/11 ]
It seems like an ep-engine issue. Was able to reproduce it with 2 and 3 node clusters. Just created 10k items and then deleted them while rebalancing in a new node. Waited till rebalance was complete. After this from time to time cbstats reported non-zero number of active items on the old node. Sometimes it reported zero active items but non-zero replica items. Although all the items were not accessible via memcached. But the items were present in couchdb and thus visible to views. Mike, sync up with Chiyoung on this issue please.
Comment by Karan Kumar (Inactive) [ 26/Dec/11 ]
Thanks Aliaksey. This is definitely an issue currently.
I was able to easily reproduce this. After rebalance, the active_items != replica_items.

Will open another bug.
Comment by Karan Kumar (Inactive) [ 26/Dec/11 ]
For reproducing this:-
1) Keep delete workload going in parallel
2) Issue rebalance of nodes in.
Comment by Mike Wiederhold [ 18/Jan/12 ]
This issue is caused by an error in views. I was able to verify that there were no items in couchdb after my deleting everything, but my view still reported having items. Alaiksey was able to produce a scenario however where active/replica item counts were not 0 so I will look into that issue. It is filed as MB-4661.
Comment by Steve Yen [ 18/Jan/12 ]
Aliaksey A looking at this right now
Comment by Aliaksey Artamonau [ 19/Jan/12 ]
Attaching more log files. From ns_server perspective everything looks fine. On all the nodes correct vbucket are indexed. I also verified that update_seqs reported by couch_set_view:get_group_info and by couch_db:get_update_seq are the same.
Comment by Aliaksey Artamonau [ 19/Jan/12 ]
Assigning to Filipe for further investigation.
Comment by Steve Yen [ 24/Jan/12 ]
Hi Damien,
Any news/status on this one?
Thanks
Comment by Steve Yen [ 26/Jan/12 ]
Hi Damien,
Any news/status on this one? Ping #2.
Thanks
Comment by damien [ 26/Jan/12 ]
Was able to reproduce, but haven't been able to spend a lot of time on it yet. Filipe is also looking at it.
Comment by damien [ 26/Jan/12 ]
This appears to be a duplicate of MB-4692. It appears that some vbuckets aren't properly cleaned/omitted from view indexes.
Comment by Filipe Manana [ 27/Jan/12 ]
http://review.couchbase.org/#change,12767 fixes it
Comment by Farshid Ghods (Inactive) [ 06/Feb/12 ]
viewtests.ViewRebalanceTests.test_delete_x_docs_rebalance_in 1 min 43 sec Fixed
viewtests.ViewRebalanceTests.test_delete_x_docs_rebalance_out 1 min 53 sec Fixed
viewtests.ViewRebalanceTests.test_load_x_during_rebalance 6 min 17 sec Fixed
viewtests.ViewRebalanceTests.test_view_stop_start_incremental_rebalance

build : 2.0.0r-643-g4e529d3
Generated at Wed Nov 26 15:40:26 CST 2014 using JIRA 5.2.4#845-sha1:c9f4cc41abe72fb236945343a1f485c2c844dac9.