[MB-5625] Rebalance can fail with various error if user changes bucket password before rebalance process Created: 20/Jun/12  Updated: 09/Jan/13  Resolved: 31/Jul/12

Status: Closed
Project: Couchbase Server
Component/s: ns_server
Affects Version/s: 1.8.1
Fix Version/s: 2.0-beta
Security Level: Public

Type: Bug Priority: Major
Reporter: Ketaki Gangal Assignee: Aleksey Kondratenko
Resolution: Fixed Votes: 0
Labels: 1.8.1-release-notes
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: Large Cluster - Centos, - 16 node cluster
Build 181-916rel
3 buckets - bucket1(3G), bucket2(2.8G), bucket3(200M)


 Description   
Setup:
1.Setup a 18 node cluster. Enable Auto-failover
2.Load data on all 3 buckets [around 50M, 22M, 500k] items.
3. Continue loading data..
4. Remove orchestrator node [105] Add new node [126]
5. Issue rebalance on this cluster. Rebalance failed with "replicator died" - filed bug 5343
6. Re-issue rebalance . Rebalance fails with " change_filter_failed"

Output:
Reissue rebalance fails with "change_filter_failed"
Node 105 is auto-failed over

Rebalance exited with reason {{change_filter_failed,
{'EXIT',
{{badmatch,
{failed,
{error,
{badmatch,
{memcached_error,auth_error,
<<"Auth failure">>}},
[{ebucketmigrator_srv,connect,4},
{ebucketmigrator_srv,handle_call,3},
{gen_server,handle_msg,5},
{proc_lib,init_p_do_apply,3}]}}},
[{ns_vbm_sup,
'-local_change_vbucket_filter/4-fun-2-',6},
{misc,'-executing_on_new_process/1-fun-0-',
3}]}}},
[{ns_vbm_sup,change_vbucket_filter,4},
{ns_vbm_sup,'-set_replicas/3-fun-2-',5},
{lists,foldl,3},
{ns_vbm_sup,set_replicas,3},
{ns_vbm_sup,'-set_replicas_on_nodes/3-fun-1-',
3},
{lists,foreach,2},
{ns_vbm_sup,apply_changes,2},
{ns_vbucket_mover,sync_replicas,0}]}



Logs at - https://s3.amazonaws.com/bugdb/jira/bug4-rebalance-181/bug5.tar

 Comments   
Comment by Farshid Ghods (Inactive) [ 17/Jul/12 ]
changing bucket password impacts all ongoing TAP and newer connections between memcached on all nodes and changing the bucket password without re-establishing all those connections or establishing new connections without using the new password will cause those issues
Comment by Perry Krug [ 26/Jul/12 ]
Just had a customer run into this...is there any workaround before the fix? Do we know if simply retrying the rebalance will work?

Does this mean that replication streams are stopped as well?

What about resetting the replica chains?
Comment by Peter Wansch (Inactive) [ 31/Jul/12 ]
Alk, is this still an issue in 2.0 and if not, what is the workaround for 1.8.1? Change the password back?
Comment by Aleksey Kondratenko [ 31/Jul/12 ]
Doesn't apply to 2.0 in 2.0 compat mode. Will still be a problem in 1.8.1 compat mode.
Generated at Tue Sep 02 18:16:00 CDT 2014 using JIRA 5.2.4#845-sha1:c9f4cc41abe72fb236945343a1f485c2c844dac9.