[MB-4344] dispatcher stops persisting items to disk after a successful rebalance due to a race condition in the scheduling ( happens when rebalncing in more than 1 node) Created: 11/Oct/11 Updated: 09/Jan/13 Resolved: 11/Nov/11 |
|
| Status: | Closed |
| Project: | Couchbase Server |
| Component/s: | couchbase-bucket |
| Affects Version/s: | 1.7.1 |
| Fix Version/s: | 1.7.2 |
| Security Level: | Public |
| Type: | Bug | Priority: | Major |
| Reporter: | Farshid Ghods | Assignee: | Chiyoung Seo |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | 1.7.1-release-notes, 1.7.2-release-notes | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Description |
|
This behavior have been seen while adding 2 or more nodes. steps to reproduce is
1- create a cluster of N nodes 2- add two nodes and rebalance 3- dispatcher does not seem to pick up the items from checkpoints and persist them to disk because of this behavior we will have to keep all the closed checkpoints in memory and the node might run out of memory after a while depending on the number of mutations that happen in the cluster. |
| Comments |
| Comment by Farshid Ghods [ 11/Nov/11 ] |
|
https://github.com/membase/ep-engine/commit/9a21b04ba1863a855acd61243ad26da2d6879c01
https://github.com/membase/ep-engine/commit/9a21b04ba1863a855acd61243ad26da2d6879c01 |
| Comment by Farshid Ghods [ 11/Nov/11 ] |
|
due to this bug membase server will not be able to persist open checkpoints and it will keep them in the memory for a longer time and depending on the number of mutations the client might see temp OOM errors in that node.
|
| Comment by Farshid Ghods [ 11/Nov/11 ] |
|
https://github.com/membase/ep-engine/commit/9a21b04ba1863a855acd61243ad26da2d6879c01 https://github.com/membase/ep-engine/commit/9a21b04ba1863a855acd61243ad26da2d6879c01 |