[MB-3158] cluster failover/rebalance expected behavior when one or more nodes are out of disk space Created: 10/Dec/10  Updated: 03/Aug/12  Resolved: 03/Aug/12

Status: Resolved
Project: Couchbase Server
Component/s: couchbase-bucket, ns_server
Affects Version/s: None
Fix Version/s: 2.0
Security Level: Public

Type: Bug Priority: Major
Reporter: Frank Weigel Assignee: Dipti Borkar
Resolution: Won't Fix Votes: 0
Labels: 1.7.0-release-notes, 1.7.1-release-notes
Σ Remaining Estimate: Not Specified Remaining Estimate: Not Specified
Σ Time Spent: Not Specified Time Spent: Not Specified
Σ Original Estimate: Not Specified Original Estimate: Not Specified

Issue Links:
Duplicate
duplicates MB-3401 Rebalance failed when one node is out... Closed
Sub-Tasks:
Key
Summary
Type
Status
Assignee
MB-3159 Understand cluster behaviour if a nod... Technical task Resolved Thuan Nguyen  
Flagged:
Release Note

 Description   
On or more nodes running out of disk space cannot take down a cluster. Data already stored on a node should stay accessible, only writes to membase buckets should fail.

Nodes should still be able to be removed from cluster or failed over.

Attention also needs to be paid to behaviour of other areas that use disk space, such as logs

 Comments   
Comment by Frank Weigel [ 27/Jan/11 ]
A Pivotal Tracker story has been created for this Issue: http://www.pivotaltracker.com/story/show/9305789
Comment by Farshid Ghods (Inactive) [ 24/May/11 ]
try this scenario and update before RC
Comment by Farshid Ghods (Inactive) [ 24/May/11 ]
when one node runs out of disk space memcached goes into pending mode ) and the user can rebalance this node out from the cluster.


Shutting down bucket "default" on 'ns_1@172.16.75.128' for server shutdown ns_memcached002 ns_1@172.16.75.128 18:48:24 - Tue May 24, 2011
Usage of disk "/" on node "172.16.75.128" is over 100%

if you have two nodes running out of disk space you will not be able to failover those two nodes because failover will timeout.
the workaround is if you have two or more nodes running out of disk space you need to stop membase server on those two nodes and then you can fail over those nodes.
Comment by Perry Krug [ 25/May/11 ]
Just so long as we understand this this is still a bug. Failover should NEVER time out
Comment by Peter Wansch (Inactive) [ 28/Jun/12 ]
Farshid, this can be closed right?
Generated at Tue Jul 29 18:28:45 CDT 2014 using JIRA 5.2.4#845-sha1:c9f4cc41abe72fb236945343a1f485c2c844dac9.