[MB-6384] inability to reach some node should not cause entire per-bucket supervisor to fail [was: Rebalance 5->4 nodes is failed with reason bulk_set_vbucket_state_failed] Created: 22/Aug/12  Updated: 10/Sep/12  Resolved: 24/Aug/12

Status: Closed
Project: Couchbase Server
Component/s: ns_server
Affects Version/s: None
Fix Version/s: None
Security Level: Public

Type: Bug Priority: Blocker
Reporter: Iryna Mironava Assignee: Aleksey Kondratenko
Resolution: Fixed Votes: 0
Labels: regression
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: centOS, 64 -bit, 4 cores VMs, build #1620

Attachments: GZip Archive     GZip Archive     GZip Archive     GZip Archive     GZip Archive    

1.Rebalance in 1->5 nodes
2. Load data (1M), no views or ddocs are created
3. Start rebalance out
4. Created 3 ddocs, 2 view per ddoc
5. Rebalance is failed

2012-08-22 18:18:41.623 ns_orchestrator:4:info:message(ns_1@ - Starting rebalance, KeepNodes = ['ns_1@','ns_1@',
                                 'ns_1@','ns_1@'], EjectNodes = ['ns_1@']

2012-08-22 18:18:41.933 ns_rebalancer:0:info:message(ns_1@ - Started rebalancing bucket default
2012-08-22 18:18:42.512 ns_vbucket_mover:0:info:message(ns_1@ - Bucket "default" rebalance does not seem to be swap rebalance
2012-08-22 18:18:45.428 ns_memcached:2:info:message(ns_1@ - Shutting down bucket "default" on 'ns_1@' for server shutdown
2012-08-22 18:18:45.747 ns_orchestrator:2:info:message(ns_1@ - Rebalance exited with reason {{bulk_set_vbucket_state_failed,

Comment by Aleksey Kondratenko [ 22/Aug/12 ]
Root cause is problem in MB-6385. But this is causing per-bucket supervisor of .64 to fail because .73 deletes bucket incorrectly thinking there's server shutdown.
Comment by Farshid Ghods (Inactive) [ 22/Aug/12 ]
regressions are marked as bockers
Comment by Aleksey Kondratenko [ 24/Aug/12 ]
Should be done as well
Comment by Thuan Nguyen [ 25/Aug/12 ]
Integrated in github-ns-server-2-0 #453 (See [http://qa.hq.northscale.net/job/github-ns-server-2-0/453/])
    MB-6384: don't shutdown bucket unless we're deleting it (Revision 2e7b50a5c0faa23a1f5367536e75358e105a0d19)
MB-6384: changed replicators' supervision type to termporary (Revision b5ab81c848aef02d010062a5eb10361ed2965088)

     Result = SUCCESS
Aliaksey Kandratsenka :
Files :
* src/ns_memcached.erl

Aliaksey Kandratsenka :
Files :
* src/ns_vbm_new_sup.erl
* src/replication_changes.erl
Comment by Iryna Mironava [ 10/Sep/12 ]
Generated at Wed Sep 17 12:21:00 CDT 2014 using JIRA 5.2.4#845-sha1:c9f4cc41abe72fb236945343a1f485c2c844dac9.