[MB-4344] dispatcher stops persisting items to disk after a successful rebalance due to a race condition in the scheduling ( happens when rebalncing in more than 1 node) Created: 11/Oct/11  Updated: 09/Jan/13  Resolved: 11/Nov/11

Status: Closed
Project: Couchbase Server
Component/s: couchbase-bucket
Affects Version/s: 1.7.1
Fix Version/s: 1.7.2
Security Level: Public

Type: Bug Priority: Major
Reporter: Farshid Ghods (Inactive) Assignee: Chiyoung Seo
Resolution: Fixed Votes: 0
Labels: 1.7.1-release-notes, 1.7.2-release-notes
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   
This behavior have been seen while adding 2 or more nodes. steps to reproduce is
1- create a cluster of N nodes
2- add two nodes and rebalance
3- dispatcher does not seem to pick up the items from checkpoints and persist them to disk

because of this behavior we will have to keep all the closed checkpoints in memory and the node might run out of memory after a while depending on the number of mutations that happen in the cluster.




 Comments   
Comment by Farshid Ghods (Inactive) [ 11/Nov/11 ]
https://github.com/membase/ep-engine/commit/9a21b04ba1863a855acd61243ad26da2d6879c01

https://github.com/membase/ep-engine/commit/9a21b04ba1863a855acd61243ad26da2d6879c01

Comment by Farshid Ghods (Inactive) [ 11/Nov/11 ]
due to this bug membase server will not be able to persist open checkpoints and it will keep them in the memory for a longer time and depending on the number of mutations the client might see temp OOM errors in that node.
Comment by Farshid Ghods (Inactive) [ 11/Nov/11 ]
https://github.com/membase/ep-engine/commit/9a21b04ba1863a855acd61243ad26da2d6879c01

https://github.com/membase/ep-engine/commit/9a21b04ba1863a855acd61243ad26da2d6879c01
Generated at Wed Jul 30 15:26:43 CDT 2014 using JIRA 5.2.4#845-sha1:c9f4cc41abe72fb236945343a1f485c2c844dac9.