[MB-7383] active item resident ratio drop significantly when adding a 2.0 node to 1.8.1 cluster for upgrade ( sasl bucket ) Created: 09/Dec/12 Updated: 02/Jan/13 Resolved: 27/Dec/12 |
|
| Status: | Resolved |
| Project: | Couchbase Server |
| Component/s: | couchbase-bucket |
| Affects Version/s: | 2.0 |
| Fix Version/s: | 2.0.1 |
| Security Level: | Public |
| Type: | Bug | Priority: | Blocker |
| Reporter: | Farshid Ghods | Assignee: | Chisheng Hong |
| Resolution: | Cannot Reproduce | Votes: | 0 |
| Labels: | 2.0.0-hotfix | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Description |
|
scenario:
add 2x2.0 nodes to a 3 node cluster with 2 buckets ( default and sasl where active resident ration is 60% ) few minutes after upgrade process begins active item ration on 10.3.2.43 which is an existing 1.8.1 node , drops from 60% -> 58->48->38-15->5 and then we decided to stop rebalance i grabbed diags from the node which resident ratio dropped from 63 percent to 5 percent. https://s3.amazonaws.com/bugdb/jira/systemtest/resident-ratio-drop-2853bb89.zip please open a bug asap and mention this there. assign the bug to chiyoung and hoping that he or mike can take a look at the cluster. [stats:debug] [2012-12-08 17:43:52] active item resident ratio is at 63% and everything looks normal and then at [stats:debug] [2012-12-08 17:45:32] vb_active_perc_mem_resident 58 and at [stats:debug] [2012-12-08 17:47:12] vb_active_perc_mem_resident 49 and at [stats:debug] [2012-12-08 18:03:51] vb_active_perc_mem_resident 35 what has happened between 17:45 and 17:47 or between 17:56 and 18:03 that pushed the resident ratio this low. whatever it is there is a combination of 1.8.1 and 2.0 that is causing the issue collect info from other nodes: |
| Comments |
| Comment by Farshid Ghods [ 09/Dec/12 ] |
|
existing 1.8.1 node : https://s3.amazonaws.com/bugdb/jira/MB-7383/10.3.2.122-1292012-1553-diag.zip
2.0 node : https://s3.amazonaws.com/bugdb/jira/MB-7383/10.3.2.41-1292012-161-diag.zip existing 1.8.1 node : https://s3.amazonaws.com/bugdb/jira/MB-7383/10.3.2.43-1292012-1551-diag.zip existing 1.8.1 node : https://s3.amazonaws.com/bugdb/jira/MB-7383/10.3.2.47-1292012-1558-diag.zip |
| Comment by Farshid Ghods [ 09/Dec/12 ] |
| 2.0 node : 10.3.2.85 https://s3.amazonaws.com/bugdb/jira/MB-7383/10.3.2.85-1292012-1616-diag.zip |
| Comment by Chiyoung Seo [ 09/Dec/12 ] |
|
Farshid,
This might be caused by the memory leak bug in 1.8.1. Can you please test it with the latest 1.8.1 patch (build 943?)? |
| Comment by Farshid Ghods [ 10/Dec/12 ] |
|
from the email
Chisheng, Lets patch the ep.so file from the one which is available from 1.8.1-943-rel build and apply it to all nodes in this cluster , then add another 2.0 node and rebalance the cluster again Please keep chiyoung in the loop after the experiment. |
| Comment by Farshid Ghods [ 10/Dec/12 ] |
| QE will update the ticket when results are available |
| Comment by Chisheng Hong [ 27/Dec/12 ] |
|
https://github.com/couchbaselabs/couchbase-qe-docs/blob/master/system-tests/pine-cluster/12-10-2012.txt
Can not repro this on EC2 cluster for Centos. This problem is caused by slow disk speed in previous test: https://github.com/couchbaselabs/couchbase-qe-docs/blob/master/system-tests/pine-cluster/12-08-2012.txt |