Details
Description
Cluster information:
- 11 centos 6.2 64bit server with 4 cores CPU
- Each server has 10 GB RAM and 150 GB disk.
- 8 GB RAM for couchbase server at each node (80% total system memmories)
- Disk format ext3 on both data and root
- Each server has its own drive, no disk sharing with other server.
- Load 7 million items to both buckets
- Cluster has 2 buckets, default (3GB) and saslbucket (3GB)
- Each bucket has one doc and 2 views for each doc (default d1 and saslbucket d11)
- Maintain load about 10 K ops, query view on both doc
10.3.121.13
10.3.121.14
10.3.121.15
10.3.121.16
10.3.121.17
10.3.121.20
10.3.121.22
10.3.121.24
10.3.121.25
10.3.121.23
Create cluster with 10 nodes
Do swap rebalance. Add node 26 and remove node 25.
Before and during rebalance, cluster does not go into swap.
Rebalance failed with error "Resetting rebalance status since it's not really running"
Link to diags of all nodes https://s3.amazonaws.com/packages.couchbase/diag-logs/orange/201209/11ndoes-1697-reb-failed-reset-reb-20120907.tgz
Link to atop of all nodes https://s3.amazonaws.com/packages.couchbase/atop-files/orange/201209/atop-11nodes-1697-reb-failed-reset-reb-20120907.tgz
- 11 centos 6.2 64bit server with 4 cores CPU
- Each server has 10 GB RAM and 150 GB disk.
- 8 GB RAM for couchbase server at each node (80% total system memmories)
- Disk format ext3 on both data and root
- Each server has its own drive, no disk sharing with other server.
- Load 7 million items to both buckets
- Cluster has 2 buckets, default (3GB) and saslbucket (3GB)
- Each bucket has one doc and 2 views for each doc (default d1 and saslbucket d11)
- Maintain load about 10 K ops, query view on both doc
10.3.121.13
10.3.121.14
10.3.121.15
10.3.121.16
10.3.121.17
10.3.121.20
10.3.121.22
10.3.121.24
10.3.121.25
10.3.121.23
Create cluster with 10 nodes
Do swap rebalance. Add node 26 and remove node 25.
Before and during rebalance, cluster does not go into swap.
Rebalance failed with error "Resetting rebalance status since it's not really running"
Link to diags of all nodes https://s3.amazonaws.com/packages.couchbase/diag-logs/orange/201209/11ndoes-1697-reb-failed-reset-reb-20120907.tgz
Link to atop of all nodes https://s3.amazonaws.com/packages.couchbase/atop-files/orange/201209/atop-11nodes-1697-reb-failed-reset-reb-20120907.tgz
Activity
Thuan Nguyen
made changes -
| Field | Original Value | New Value |
|---|---|---|
| Assignee | Chiyoung Seo [ chiyoung ] | Aleksey Kondratenko [ alkondratenko ] |
| Component/s | ns_server [ 10019 ] | |
| Component/s | couchbase-bucket [ 10173 ] |
Farshid Ghods
made changes -
| Labels | longevity |
Aleksey Kondratenko
made changes -
| Assignee | Aleksey Kondratenko [ alkondratenko ] | Farshid Ghods [ farshid ] |
Karan Kumar
made changes -
| Assignee | Farshid Ghods [ farshid ] | Thuan Nguyen [ thuan ] |
Farshid Ghods
made changes -
| Assignee | Thuan Nguyen [ thuan ] | Karan Kumar [ karan ] |
Karan Kumar
made changes -
| Labels | longevity | system-test |
Farshid Ghods
made changes -
| Summary | [longevity] rebalance failed due to error "Resetting rebalance status since it's not really running" | [longevity] rebalance failed due to error "Resetting rebalance status since it's not really running" when there are major page faults on some of the nodes in the cluster |
Farshid Ghods
made changes -
| Labels | system-test | 2.0-beta-release-notes system-test |
Farshid Ghods
made changes -
| Fix Version/s | 2.0 [ 10114 ] | |
| Fix Version/s | 2.0-beta [ 10113 ] |
Karan Kumar
made changes -
| Status | Open [ 1 ] | Resolved [ 5 ] |
| Resolution | Fixed [ 1 ] |
Farshid Ghods
made changes -
| Status | Resolved [ 5 ] | Closed [ 6 ] |
[couchdb:info,2012-09-07T18:46:04.086,ns_1@10.3.121.13:<0.13536.5>:couch_log:info:39]Native initial compact succeeded for "saslbucket/51"
[couchdb:info,2012-09-07T18:46:04.095,ns_1@10.3.121.13:<0.8901.0>:couch_log:info:39]CouchDB swapping files /data/saslbucket/51.couch.2 and /data/saslbucket/51.couch.1.compact.
[couchdb:info,2012-09-07T18:46:04.104,ns_1@10.3.121.13:<0.8901.0>:couch_log:info:39]Compaction for db "saslbucket/51" completed.
[ns_server:info,2012-09-07T18:46:04.105,ns_1@10.3.121.13:<0.13562.5>:compaction_daemon:spawn_vbucket_compactor:644]Compacting <<"saslbucket/52">>
[couchdb:info,2012-09-07T18:46:04.106,ns_1@10.3.121.13:<0.8894.0>:couch_log:info:39]Starting compaction for db "saslbucket/52"
[couchdb:info,2012-09-07T18:46:04.522,ns_1@10.3.121.13:<0.7547.5>:couch_log:info:39]10.3.121.15 - - POST /_view_merge/?stale=ok&limit=10 200
[ns_server:debug,2012-09-07T18:46:04.668,ns_1@10.3.121.13:<0.13524.5>:ns_pubsub:do_subscribe_link:149]Deleting {ns_stats_event,<0.29002.1>} event handler: ok
[ns_server:debug,2012-09-07T18:46:04.728,ns_1@10.3.121.13:<0.13566.5>:ns_pubsub:do_subscribe_link:120]Started subscription {ns_stats_event,<0.29002.1>}
[couchdb:info,2012-09-07T18:46:04.855,ns_1@10.3.121.13:<0.5824.5>:couch_log:info:39]10.3.121.25 - - POST /_view_merge/?stale=ok&limit=10 200
[couchdb:info,2012-09-07T18:46:04.868,ns_1@10.3.121.13:<0.13271.5>:couch_log:info:39]Updater checkpointing set view `saslbucket` update for main group `_design/d11`
[ns_server:debug,2012-09-07T18:46:04.864,ns_1@10.3.121.13:<0.13566.5>:ns_pubsub:do_subscribe_link:149]Deleting {ns_stats_event,<0.29002.1>} event handler: ok
[ns_server:debug,2012-09-07T18:46:04.904,ns_1@10.3.121.13:<0.13581.5>:ns_pubsub:do_subscribe_link:120]Started subscription {ns_stats_event,<0.29002.1>}
[couchdb:info,2012-09-07T18:46:04.965,ns_1@10.3.121.13:<0.5824.5>:couch_log:info:39]10.3.121.25 - - POST /_view_merge/?stale=ok&limit=10 200
[user:info,2012-09-07T18:46:05.012,ns_1@10.3.121.13:ns_config:ns_janitor:maybe_stop_rebalance_status:139]Resetting rebalance status since it's not really running
[ns_server:debug,2012-09-07T18:46:05.014,ns_1@10.3.121.13:'capi_set_view_manager-saslbucket':capi_set_view_manager:handle_info:330]doing replicate_newnodes_docs
[ns_server:debug,2012-09-07T18:46:05.014,ns_1@10.3.121.13:'capi_set_view_manager-default':capi_set_view_manager:handle_info:330]doing replicate_newnodes_docs
[ns_server:debug,2012-09-07T18:46:05.018,ns_1@10.3.121.13:ns_config_rep:ns_config_rep:do_push_keys:317]Replicating some config keys ([rebalance_status,rebalancer_pid]..)
[ns_server:debug,2012-09-07T18:46:05.045,ns_1@10.3.121.13:ns_config_log:ns_config_log:log_common:111]config change:
rebalancer_pid ->
undefined
[ns_server:debug,2012-09-07T18:46:05.045,ns_1@10.3.121.13:'capi_set_view_manager-saslbucket':capi_set_view_manager:handle_info:330]doing replicate_newnodes_docs
[ns_server:debug,2012-09-07T18:46:05.046,ns_1@10.3.121.13:'capi_set_view_manager-default':capi_set_view_manager:handle_info:330]doing replicate_newnodes_docs
[ns_server:debug,2012-09-07T18:46:05.064,ns_1@10.3.121.13:ns_config_log:ns_config_log:log_common:111]config change:
rebalance_status ->
{none,<<"Rebalance stopped by janitor.">>}
[ns_server:debug,2012-09-07T18:46:05.064,ns_1@10.3.121.13:'capi_set_view_manager-default':capi_set_view_manager:handle_info:330]doing replicate_newnodes_docs
[ns_server:debug,2012-09-07T18:46:05.064,ns_1@10.3.121.13:'capi_set_view_manager-saslbucket':capi_set_view_manager:handle_info:330]doing replicate_newnodes_docs
[couchdb:info,2012-09-07T18:46:05.336,ns_1@10.3.121.13:<0.6542.5>:couch_log:info:39]10.3.121.17 - - POST /_view_merge/?stale=ok&limit=10 200
[stats:error,2012-09-07T18:46:05.424,ns_1@10.3.121.13:<0.5179.2>:stats_reader:log_bad_responses:191]Some nodes didn't respond: ['ns_1@10.3.121.26']
[ns_server:debug,2012-09-07T18:46:05.894,ns_1@10.3.121.13:<0.13614.5>:ns_pubsub:do_subscribe_link:120]Started subscription {ns_stats_event,<0.10014.5>}
[ns_server:debug,2012-09-07T18:46:05.939,ns_1@10.3.121.13:<0.13581.5>:ns_pubsub:do_subscribe_link:149]Deleting {ns_stats_event,<0.29002.1>} event handler: ok
[couchdb:info,2012-09-07T18:46:06.629,ns_1@10.3.121.13:<0.13563.5>:couch_log:info:39]Native compactor output: Compacted /data/saslbucket/52.couch.1 -> /data/saslbucket/52.couch.1.compact
[couchdb:info,2012-09-07T18:46:06.665,ns_1@10.3.121.13:<0.13563.5>:couch_log:info:39]Native initial compact succeeded for "saslbucket/52"
[couchdb:info,2012-09-07T18:46:06.678,ns_1@10.3.121.13:<0.8894.0>:couch_log:info:39]CouchDB swapping files /data/saslbucket/52.couch.2 and /data/saslbucket/52.couch.1.compact.
[couchdb:info,2012-09-07T18:46:06.981,ns_1@10.3.121.13:<0.8894.0>:couch_log:info:39]Compaction for db "saslbucket/52" completed.
[couchdb:info,2012-09-07T18:46:07.039,ns_1@10.3.121.13:<0.6542.5>:couch_log:info:39]10.3.121.17 - - POST /_view_merge/?stale=ok&limit=10 200
[ns_server:info,2012-09-07T18:46:07.040,ns_1@10.3.121.13:<0.13641.5>:compaction_daemon:spawn_vbucket_compactor:644]Compacting <<"saslbucket/53">>
[couchdb:info,2012-09-07T18:46:07.040,ns_1@10.3.121.13:<0.8887.0>:couch_log:info:39]Starting compaction for db "saslbucket/53"
[ns_server:info,2012-09-07T18:46:07.435,ns_1@10.3.121.13:<0.13244.5>:ns_orchestrator:handle_info:258]Skipping janitor in state janitor_running: {janitor_state,
["default"],
<0.13595.5>}
[ns_server:warn,2012-09-07T18:46:09.013,ns_1@10.3.121.13:mb_master:mb_master:send_heartbeat_with_peers:473]send heartbeat timed out