Details
-
Type:
Bug
-
Status:
Resolved
-
Priority:
Critical
-
Resolution: Fixed
-
Affects Version/s: 1.8.1-release-candidate
-
Fix Version/s: 2.0
-
Component/s: couchbase-bucket
-
Security Level: Public
-
Labels:
-
Environment:Ubuntu 64 bit
181-831-rel
Description
Failing test is:-
rebalancetests.RebalanceInOutWithParallelLoad.test_load,get-logs:True,replica:2,num_nodes:7
[ns_server:info] [2012-05-21 8:17:31] [ns_1@10.1.3.109:<0.15105.17>:ns_replicas_builder:kill_a_bunch_of_tap_names:209] Killed the following tap names on 'ns_1@10.1.3.109': [<<"replication_building_62_'ns_1@10.1.3.112'">>]
[ns_server:info] [2012-05-21 8:17:31] [ns_1@10.1.3.109:<0.15104.17>:ns_single_vbucket_mover:mover_inner:88] Got exit message (parent is <0.13796.17>). Exiting...
{'EXIT',<0.15105.17>,{replicator_died,{'EXIT',<16541.21868.10>,normal}}}
[ns_server:error] [2012-05-21 8:17:31] [ns_1@10.1.3.109:<0.15104.17>:ns_replicas_builder:sync_shutdown_many:147] Shutdown of the following failed: [{<0.15105.17>,
{replicator_died,
{'EXIT',<16541.21868.10>,normal}}}]
[error_logger:error] [2012-05-21 8:17:31] [ns_1@10.1.3.109:error_logger:ale_error_logger_handler:log_report:72]
=========================CRASH REPORT=========================
crasher:
initial call: erlang:apply/2
pid: <0.15105.17>
registered_name: []
exception exit: {replicator_died,{'EXIT',<16541.21868.10>,normal}}
in function ns_replicas_builder:'-build_replicas_main/6-fun-0-'/1
in call from ns_replicas_builder:observe_wait_all_done_tail/5
in call from ns_replicas_builder:observe_wait_all_done/5
in call from ns_replicas_builder:'-build_replicas_main/6-fun-1-'/8
in call from ns_replicas_builder:try_with_maybe_ignorant_after/2
in call from ns_replicas_builder:build_replicas_main/6
ancestors: [<0.15104.17>,<0.13796.17>,<0.13763.17>]
messages: [{'EXIT',<16541.21868.10>,normal}]
links: [<0.15104.17>]
dictionary: []
trap_exit: true
status: running
heap_size: 121393
stack_size: 24
reductions: 12423
neighbours:
[ns_server:error] [2012-05-21 8:17:31] [ns_1@10.1.3.109:<0.15104.17>:ns_replicas_builder:try_with_maybe_ignorant_after:68] Eating exception from ignorant after-block:
{error,{badmatch,[{<0.15105.17>,
{replicator_died,{'EXIT',<16541.21868.10>,normal}}}]},
[{ns_replicas_builder,sync_shutdown_many,1},
{ns_replicas_builder,try_with_maybe_ignorant_after,2},
{ns_single_vbucket_mover,mover,6},
{proc_lib,init_p_do_apply,3}]}
[rebalance:error] [2012-05-21 8:17:31] [ns_1@10.1.3.109:<0.13796.17>:ns_vbucket_mover:handle_info:158] <0.15104.17> exited with {exited,
{'EXIT',<0.15105.17>,
{replicator_died,
{'EXIT',<16541.21868.10>,normal}}}}
[ns_server:info] [2012-05-21 8:17:31] [ns_1@10.1.3.109:<0.9348.1>:ns_port_server:log:161] memcached<0.9348.1>: TAP (Producer) eq_tapq:rebalance_61 - Schedule the backfill for vbucket 61
memcached<0.9348.1>: TAP (Producer) eq_tapq:rebalance_61 - Sending TAP_OPAQUE with command "opaque_enable_auto_nack" and vbucket 0
memcached<0.9348.1>: TAP (Producer) eq_tapq:rebalance_61 - Sending TAP_OPAQUE with command "enable_checkpoint_sync" and vbucket 0
memcached<0.9348.1>: TAP (Producer) eq_tapq:rebalance_61 - Sending TAP_VBUCKET_SET with vbucket 61 and state "pending"
memcached<0.9348.1>: TAP (Producer) eq_tapq:rebalance_61 - Sending TAP_OPAQUE with command "initial_vbucket_stream" and vbucket 61
memcached<0.9348.1>: TAP (Producer) eq_tapq:rebalance_61 - Backfill is completed with VBuckets 61,
memcached<0.9348.1>: TAP (Producer) eq_tapq:rebalance_61 - Sending TAP_OPAQUE with command "close_backfill" and vbucket 61
memcached<0.9348.1>: Vbucket <61> is going dead.
memcached<0.9348.1>: TAP (Producer) eq_tapq:rebalance_61 - Sending TAP_VBUCKET_SET with vbucket 61 and state "active"
memcached<0.9348.1>: TAP takeover is completed. Disconnecting tap stream <eq_tapq:rebalance_61>
memcached<0.9348.1>: TAP (Producer) eq_tapq:replication_building_62_'ns_1@10.1.3.112' - Schedule the backfill for vbucket 62
memcached<0.9348.1>: TAP (Producer) eq_tapq:replication_building_62_'ns_1@10.1.3.112' - Sending TAP_OPAQUE with command "opaque_enable_auto_nack" and vbucket 0
memcached<0.9348.1>: TAP (Producer) eq_tapq:replication_building_62_'ns_1@10.1.3.112' - Sending TAP_OPAQUE with command "enable_checkpoint_sync" and vbucket 0
memcached<0.9348.1>: TAP (Producer) eq_tapq:replication_building_62_'ns_1@10.1.3.112' - Sending TAP_OPAQUE with command "initial_vbucket_stream" and vbucket 62
[error_logger:error] [2012-05-21 8:17:31] [ns_1@10.1.3.109:error_logger:ale_error_logger_handler:log_report:72]
=========================CRASH REPORT=========================
crasher:
initial call: ns_single_vbucket_mover:mover/6
pid: <0.15104.17>
registered_name: []
exception exit: {exited,
{'EXIT',<0.15105.17>,
{replicator_died,
{'EXIT',<16541.21868.10>,normal}}}}
in function ns_single_vbucket_mover:mover_inner/6
in call from ns_replicas_builder:try_with_maybe_ignorant_after/2
in call from ns_single_vbucket_mover:mover/6
ancestors: [<0.13796.17>,<0.13763.17>]
messages: []
links: [<0.13796.17>]
dictionary: [{cleanup_list,[<0.15105.17>]}]
trap_exit: true
status: running
heap_size: 987
stack_size: 24
reductions: 4014
rebalancetests.RebalanceInOutWithParallelLoad.test_load,get-logs:True,replica:2,num_nodes:7
[ns_server:info] [2012-05-21 8:17:31] [ns_1@10.1.3.109:<0.15105.17>:ns_replicas_builder:kill_a_bunch_of_tap_names:209] Killed the following tap names on 'ns_1@10.1.3.109': [<<"replication_building_62_'ns_1@10.1.3.112'">>]
[ns_server:info] [2012-05-21 8:17:31] [ns_1@10.1.3.109:<0.15104.17>:ns_single_vbucket_mover:mover_inner:88] Got exit message (parent is <0.13796.17>). Exiting...
{'EXIT',<0.15105.17>,{replicator_died,{'EXIT',<16541.21868.10>,normal}}}
[ns_server:error] [2012-05-21 8:17:31] [ns_1@10.1.3.109:<0.15104.17>:ns_replicas_builder:sync_shutdown_many:147] Shutdown of the following failed: [{<0.15105.17>,
{replicator_died,
{'EXIT',<16541.21868.10>,normal}}}]
[error_logger:error] [2012-05-21 8:17:31] [ns_1@10.1.3.109:error_logger:ale_error_logger_handler:log_report:72]
=========================CRASH REPORT=========================
crasher:
initial call: erlang:apply/2
pid: <0.15105.17>
registered_name: []
exception exit: {replicator_died,{'EXIT',<16541.21868.10>,normal}}
in function ns_replicas_builder:'-build_replicas_main/6-fun-0-'/1
in call from ns_replicas_builder:observe_wait_all_done_tail/5
in call from ns_replicas_builder:observe_wait_all_done/5
in call from ns_replicas_builder:'-build_replicas_main/6-fun-1-'/8
in call from ns_replicas_builder:try_with_maybe_ignorant_after/2
in call from ns_replicas_builder:build_replicas_main/6
ancestors: [<0.15104.17>,<0.13796.17>,<0.13763.17>]
messages: [{'EXIT',<16541.21868.10>,normal}]
links: [<0.15104.17>]
dictionary: []
trap_exit: true
status: running
heap_size: 121393
stack_size: 24
reductions: 12423
neighbours:
[ns_server:error] [2012-05-21 8:17:31] [ns_1@10.1.3.109:<0.15104.17>:ns_replicas_builder:try_with_maybe_ignorant_after:68] Eating exception from ignorant after-block:
{error,{badmatch,[{<0.15105.17>,
{replicator_died,{'EXIT',<16541.21868.10>,normal}}}]},
[{ns_replicas_builder,sync_shutdown_many,1},
{ns_replicas_builder,try_with_maybe_ignorant_after,2},
{ns_single_vbucket_mover,mover,6},
{proc_lib,init_p_do_apply,3}]}
[rebalance:error] [2012-05-21 8:17:31] [ns_1@10.1.3.109:<0.13796.17>:ns_vbucket_mover:handle_info:158] <0.15104.17> exited with {exited,
{'EXIT',<0.15105.17>,
{replicator_died,
{'EXIT',<16541.21868.10>,normal}}}}
[ns_server:info] [2012-05-21 8:17:31] [ns_1@10.1.3.109:<0.9348.1>:ns_port_server:log:161] memcached<0.9348.1>: TAP (Producer) eq_tapq:rebalance_61 - Schedule the backfill for vbucket 61
memcached<0.9348.1>: TAP (Producer) eq_tapq:rebalance_61 - Sending TAP_OPAQUE with command "opaque_enable_auto_nack" and vbucket 0
memcached<0.9348.1>: TAP (Producer) eq_tapq:rebalance_61 - Sending TAP_OPAQUE with command "enable_checkpoint_sync" and vbucket 0
memcached<0.9348.1>: TAP (Producer) eq_tapq:rebalance_61 - Sending TAP_VBUCKET_SET with vbucket 61 and state "pending"
memcached<0.9348.1>: TAP (Producer) eq_tapq:rebalance_61 - Sending TAP_OPAQUE with command "initial_vbucket_stream" and vbucket 61
memcached<0.9348.1>: TAP (Producer) eq_tapq:rebalance_61 - Backfill is completed with VBuckets 61,
memcached<0.9348.1>: TAP (Producer) eq_tapq:rebalance_61 - Sending TAP_OPAQUE with command "close_backfill" and vbucket 61
memcached<0.9348.1>: Vbucket <61> is going dead.
memcached<0.9348.1>: TAP (Producer) eq_tapq:rebalance_61 - Sending TAP_VBUCKET_SET with vbucket 61 and state "active"
memcached<0.9348.1>: TAP takeover is completed. Disconnecting tap stream <eq_tapq:rebalance_61>
memcached<0.9348.1>: TAP (Producer) eq_tapq:replication_building_62_'ns_1@10.1.3.112' - Schedule the backfill for vbucket 62
memcached<0.9348.1>: TAP (Producer) eq_tapq:replication_building_62_'ns_1@10.1.3.112' - Sending TAP_OPAQUE with command "opaque_enable_auto_nack" and vbucket 0
memcached<0.9348.1>: TAP (Producer) eq_tapq:replication_building_62_'ns_1@10.1.3.112' - Sending TAP_OPAQUE with command "enable_checkpoint_sync" and vbucket 0
memcached<0.9348.1>: TAP (Producer) eq_tapq:replication_building_62_'ns_1@10.1.3.112' - Sending TAP_OPAQUE with command "initial_vbucket_stream" and vbucket 62
[error_logger:error] [2012-05-21 8:17:31] [ns_1@10.1.3.109:error_logger:ale_error_logger_handler:log_report:72]
=========================CRASH REPORT=========================
crasher:
initial call: ns_single_vbucket_mover:mover/6
pid: <0.15104.17>
registered_name: []
exception exit: {exited,
{'EXIT',<0.15105.17>,
{replicator_died,
{'EXIT',<16541.21868.10>,normal}}}}
in function ns_single_vbucket_mover:mover_inner/6
in call from ns_replicas_builder:try_with_maybe_ignorant_after/2
in call from ns_single_vbucket_mover:mover/6
ancestors: [<0.13796.17>,<0.13763.17>]
messages: []
links: [<0.13796.17>]
dictionary: [{cleanup_list,[<0.15105.17>]}]
trap_exit: true
status: running
heap_size: 987
stack_size: 24
reductions: 4014
Activity
Aleksey Kondratenko
made changes -
| Field | Original Value | New Value |
|---|---|---|
| Assignee | Aleksey Kondratenko [ alkondratenko ] | Chiyoung Seo [ chiyoung ] |
Chiyoung Seo
made changes -
| Sprint Status | Current Sprint | |
| Sprint Priority | 0 |
Chiyoung Seo
made changes -
| Component/s | couchbase-bucket [ 10173 ] | |
| Component/s | ns_server [ 10019 ] |
Farshid Ghods
made changes -
| Labels | 1.8.1-release-notes | |
| Fix Version/s | 2.0-developer-preview-5 [ 10290 ] | |
| Fix Version/s | 1.8.1 [ 10295 ] | |
| Priority | Blocker [ 1 ] | Critical [ 2 ] |
| Sprint Status | Current Sprint | |
| Sprint Priority | 0 |
Karan Kumar
made changes -
| Attachment | 2ae93467-b468-4d0e-966e-7f14cdc3bb1f-10.1.3.85-diag.gz [ 13566 ] | |
| Attachment | 2ae93467-b468-4d0e-966e-7f14cdc3bb1f-10.1.3.82-diag.gz [ 13567 ] | |
| Attachment | 2ae93467-b468-4d0e-966e-7f14cdc3bb1f-10.1.3.83-diag.gz [ 13568 ] | |
| Attachment | 2ae93467-b468-4d0e-966e-7f14cdc3bb1f-10.1.3.84-diag.gz [ 13569 ] | |
| Attachment | 2ae93467-b468-4d0e-966e-7f14cdc3bb1f-10.1.3.86-diag.gz [ 13570 ] |
Karan Kumar
made changes -
| Attachment | 32326eea-21d4-4ef0-bf82-233b681a5336-10.1.3.111-diag.gz [ 13643 ] | |
| Attachment | 32326eea-21d4-4ef0-bf82-233b681a5336-10.1.3.120-diag.gz [ 13644 ] | |
| Attachment | 32326eea-21d4-4ef0-bf82-233b681a5336-10.1.3.119-diag.gz [ 13645 ] | |
| Attachment | 32326eea-21d4-4ef0-bf82-233b681a5336-10.1.3.112-diag.gz [ 13646 ] | |
| Attachment | 32326eea-21d4-4ef0-bf82-233b681a5336-10.1.3.121-diag.gz [ 13647 ] |
Ketaki Gangal
made changes -
| Comment | [ Yes, the password for the bucket- Bucket1 was changed at some point on the cluster, before the last rebalance was issued. ] |
Peter Wansch
made changes -
| Fix Version/s | 2.0-beta [ 10113 ] | |
| Fix Version/s | 2.0-developer-preview-5 [ 10290 ] |
Peter Wansch
made changes -
| Summary | Rebalance failed due to replicator_died: exited (ns_single_vbucket_mover) | memcached dropped connections: Rebalance failed due to replicator_died: exited (ns_single_vbucket_mover) |
Peter Wansch
made changes -
| Assignee | Chiyoung Seo [ chiyoung ] | Farshid Ghods [ farshid ] |
Farshid Ghods
made changes -
| Sprint Status | Next Sprint |
Andrei Baranouski
made changes -
| Attachment | 10.5.2.13-8091-diag.txt.gz [ 14253 ] | |
| Attachment | 10.5.2.14-8091-diag.txt.gz [ 14254 ] | |
| Attachment | 10.5.2.15-8091-diag.txt.gz [ 14255 ] | |
| Attachment | 10.5.2.16-8091-diag.txt.gz [ 14256 ] | |
| Attachment | 10.5.2.18-8091-diag.txt.gz [ 14257 ] | |
| Attachment | 10.5.2.19-8091-diag.txt.gz [ 14258 ] |
Peter Wansch
made changes -
| Fix Version/s | 2.0 [ 10114 ] | |
| Fix Version/s | 2.0-beta [ 10113 ] |
Peter Wansch
made changes -
| Status | Open [ 1 ] | Closed [ 6 ] |
| Resolution | Fixed [ 1 ] |
Peter Wansch
made changes -
| Resolution | Fixed [ 1 ] | |
| Status | Closed [ 6 ] | Reopened [ 4 ] |
| Sprint Status | Next Sprint |
Peter Wansch
made changes -
| Status | Reopened [ 4 ] | Closed [ 6 ] |
| Resolution | Fixed [ 1 ] |
Frank Weigel
made changes -
| Resolution | Fixed [ 1 ] | |
| Status | Closed [ 6 ] | Reopened [ 4 ] |
| Assignee | Farshid Ghods [ farshid ] | Frank Weigel [ frank ] |
Frank Weigel
made changes -
| Status | Reopened [ 4 ] | Resolved [ 5 ] |
| Resolution | Fixed [ 1 ] |
[ns_server:info] [2012-05-21 8:17:31] [ns_1@10.1.3.112:<0.21868.10>:ebucketmigrator_srv:init:181] Setting {"10.1.3.112",11209} vbucket 62 to state replica
[rebalance:debug] [2012-05-21 8:17:31] [ns_1@10.1.3.112:<0.21868.10>:ebucketmigrator_srv:init:186] CheckpointIdsDict:
{dict,64,16,16,8,80,48,
{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},
{{[[0|1],[16|1],[32|1],[48|1]],
[[3|1],[19|1],[35|1],[51|1]],
[[6|1],[22|1],[38|1],[54|1]],
[[9|1],[25|1],[41|1],[57|1]],
[[12|1],[28|1],[44|1],[60|1]],
[[15|1],[31|1],[47|1],[63|1]],
[[2|1],[18|1],[34|1],[50|1]],
[[5|1],[21|1],[37|1],[53|1]],
[[8|1],[24|1],[40|1],[56|1]],
[[11|1],[27|1],[43|1],[59|1]],
[[14|1],[30|1],[46|1],[62|1]],
[[1|1],[17|1],[33|1],[49|1]],
[[4|1],[20|1],[36|1],[52|1]],
[[7|1],[23|1],[39|1],[55|1]],
[[10|1],[26|1],[42|1],[58|1]],
[[13|1],[29|1],[45|1],[61|1]]}}}
[ns_server:debug] [2012-05-21 8:17:31] [ns_1@10.1.3.112:<0.21868.10>:ebucketmigrator_srv:init:209] killing tap named: replication_building_62_'ns_1@10.1.3.112'
[rebalance:debug] [2012-05-21 8:17:31] [ns_1@10.1.3.112:<0.21868.10>:ebucketmigrator_srv:init:247] upstream_sender pid: <0.21869.10>
[rebalance:info] [2012-05-21 8:17:31] [ns_1@10.1.3.112:<0.21868.10>:ebucketmigrator_srv:process_upstream:447] Initial stream for vbucket 62
[rebalance:info] [2012-05-21 8:17:31] [ns_1@10.1.3.112:<0.21868.10>:ebucketmigrator_srv:do_confirm_sent_messages:315] Got close ack!
And this is logs from memcached on source node (.109)
memcached<0.9348.1>: TAP (Producer) eq_tapq:replication_building_62_'ns_1@10.1.3.112' - Schedule the backfill for vbucket 62
memcached<0.9348.1>: TAP (Producer) eq_tapq:replication_building_62_'ns_1@10.1.3.112' - Sending TAP_OPAQUE with command "opaque_enable_auto_nack" and vbucket 0
memcached<0.9348.1>: TAP (Producer) eq_tapq:replication_building_62_'ns_1@10.1.3.112' - Sending TAP_OPAQUE with command "enable_checkpoint_sync" and vbucket 0
memcached<0.9348.1>: TAP (Producer) eq_tapq:replication_building_62_'ns_1@10.1.3.112' - Sending TAP_OPAQUE with command "initial_vbucket_stream" and vbucket 62