[RN 2.0.1]][longevity] something unknown is causing severe timeouts in ns_server. Particularly under views building and/or compaction. Which causes rebalance to fail and other types of badness.
(MB-6595)
|
|
| Status: | Closed |
| Project: | Couchbase Server |
| Component/s: | ns_server |
| Affects Version/s: | 2.0 |
| Fix Version/s: | 2.0.2 |
| Security Level: | Public |
| Type: | Technical task | Priority: | Major |
| Reporter: | Andrei Baranouski | Assignee: | Andrei Baranouski |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | 2.0-release-notes | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
| Description |
|
steps:
1. cluster #1 2 nodes: 10.3.121.112, 10.3.121.113 3 buckets with ~700K items in each: default, sasl:sasl( replica=2) standard cluster #2 10.3.121.115: 3 buckets: default1:default1( replica=2) sasl( no password, replica is disabled) 2. cluster#2 rebalance in 10.3.121.116 3. during rebalance create on 10.3.121.112 create xdcr replications: default this cluster bucket "default1" on cluster "http://10.3.121.115/" Replicating Last 10 errors on change sasl this cluster bucket "sasl" on cluster "http://10.3.121.115/" Starting Up on change standard this cluster bucket "standard" on cluster "http://10.3.121.115/" Replicating Last 10 errors on change and on 10.3.121.115 create: standard this cluster bucket "standard" on cluster "http://10.3.121.112:8091/" Replicating on change see screenshot with details result: [ns_server:debug,2012-11-18T4:22:25.816,ns_1@10.3.121.115:ns_config_log<0.338.0>:ns_config_log:log_common:111]config change: rebalance_status -> {none,<<"Rebalance stopped by janitor.">>} [ns_server:info,2012-11-18T4:22:25.992,ns_1@10.3.121.115:<0.5807.4>:compaction_daemon:spawn_vbucket_compactor:639]Compacting <<"default1/209">> [ns_server:info,2012-11-18T4:22:26.405,ns_1@10.3.121.115:<0.5839.4>:compaction_daemon:spawn_vbucket_compactor:639]Compacting <<"default1/210">> [ns_server:info,2012-11-18T4:22:26.725,ns_1@10.3.121.115:<0.5860.4>:compaction_daemon:spawn_vbucket_compactor:639]Compacting <<"default1/211">> [ns_server:info,2012-11-18T4:22:27.334,ns_1@10.3.121.115:<0.5915.4>:compaction_daemon:spawn_vbucket_compactor:639]Compacting <<"default1/212">> [ns_server:info,2012-11-18T4:22:27.810,ns_1@10.3.121.115:<0.5937.4>:compaction_daemon:spawn_vbucket_compactor:639]Compacting <<"default1/213">> [ns_server:info,2012-11-18T4:22:28.086,ns_1@10.3.121.115:<0.5980.4>:compaction_daemon:spawn_vbucket_compactor:639]Compacting <<"default1/214">> [ns_server:error,2012-11-18T4:22:28.602,ns_1@10.3.121.115:ns_memcached-standard<0.21850.0>:ns_memcached:handle_info:630]handle_info(ensure_bucket,..) took too long: 853680 us [ns_server:info,2012-11-18T4:22:28.785,ns_1@10.3.121.115:<0.6043.4>:compaction_daemon:spawn_vbucket_compactor:639]Compacting <<"default1/215">> [ns_server:error,2012-11-18T4:22:29.450,ns_1@10.3.121.115:<0.9097.0>:ns_memcached:verify_report_long_call:297]call {get_meta,<<"default111-5c34112c98cb69d2">>,883} took too long: 551954 us [ns_server:error,2012-11-18T4:22:29.451,ns_1@10.3.121.115:<0.21854.0>:ns_memcached:verify_report_long_call:297]call {get_meta,<<"standard-5e1b18c4c6a6d316">>,751} took too long: 549644 us [ns_server:info,2012-11-18T4:22:29.652,ns_1@10.3.121.115:<0.6111.4>:compaction_daemon:spawn_vbucket_compactor:639]Compacting <<"default1/216">> [ns_server:info,2012-11-18T4:22:29.918,ns_1@10.3.121.115:<0.6151.4>:compaction_daemon:spawn_vbucket_compactor:639]Compacting <<"default1/217">> [ns_server:error,2012-11-18T4:22:29.968,ns_1@10.3.121.115:ns_memcached-sasl<0.15429.0>:ns_memcached:handle_info:630]handle_info(ensure_bucket,..) took too long: 1629801 us [ns_server:info,2012-11-18T4:22:30.052,ns_1@10.3.121.115:<0.6192.4>:compaction_daemon:spawn_vbucket_compactor:639]Compacting <<"default1/218">> [ns_server:info,2012-11-18T4:22:30.400,ns_1@10.3.121.115:<0.6233.4>:compaction_daemon:spawn_vbucket_compactor:639]Compacting <<"default1/219">> [ns_server:info,2012-11-18T4:22:30.762,ns_1@10.3.121.115:<0.6261.4>:compaction_daemon:spawn_vbucket_compactor:639]Compacting <<"default1/220">> [ns_server:info,2012-11-18T4:22:30.900,ns_1@10.3.121.115:<0.6285.4>:compaction_daemon:spawn_vbucket_compactor:639]Compacting <<"default1/221">> [ns_server:info,2012-11-18T4:22:31.548,ns_1@10.3.121.115:<0.6348.4>:compaction_daemon:spawn_vbucket_compactor:639]Compacting <<"default1/222">> [ns_server:error,2012-11-18T4:22:31.882,ns_1@10.3.121.115:<0.9099.0>:ns_memcached:verify_report_long_call:297]call {stats,<<>>} took too long: 627784 us [ns_server:info,2012-11-18T4:22:31.920,ns_1@10.3.121.115:<0.6451.4>:compaction_daemon:spawn_vbucket_compactor:639]Compacting <<"default1/223">> [ns_server:error,2012-11-18T4:22:31.947,ns_1@10.3.121.115:<0.21855.0>:ns_memcached:verify_report_long_call:297]call topkeys took too long: 5994193 us [stats:warn,2012-11-18T4:22:32.332,ns_1@10.3.121.115:<0.21861.0>:stats_collector:latest_tick:223]Dropped 6 ticks [ns_server:info,2012-11-18T4:22:32.661,ns_1@10.3.121.115:<0.6494.4>:compaction_daemon:spawn_vbucket_compactor:639]Compacting <<"default1/224">> [ns_server:info,2012-11-18T4:22:33.516,ns_1@10.3.121.115:<0.6536.4>:compaction_daemon:spawn_vbucket_compactor:639]Compacting <<"default1/225">> [ns_server:error,2012-11-18T4:22:33.848,ns_1@10.3.121.115:<0.21855.0>:ns_memcached:verify_report_long_call:297]call {stats,<<>>} took too long: 692405 us [ns_server:info,2012-11-18T4:22:34.464,ns_1@10.3.121.115:<0.6579.4>:compaction_daemon:spawn_vbucket_compactor:639]Compacting <<"default1/226">> [ns_server:error,2012-11-18T4:22:34.947,ns_1@10.3.121.115:<0.21855.0>:ns_memcached:verify_report_long_call:297]call list_vbuckets took too long: 700928 us [stats:warn,2012-11-18T4:22:35.000,ns_1@10.3.121.115:<0.21861.0>:stats_collector:latest_tick:223]Dropped 1 ticks [ns_server:info,2012-11-18T4:22:35.363,ns_1@10.3.121.115:<0.6615.4>:compaction_daemon:spawn_vbucket_compactor:639]Compacting <<"default1/227">> [ns_server:info,2012-11-18T4:22:35.662,ns_1@10.3.121.115:<0.6639.4>:compaction_daemon:spawn_vbucket_compactor:639]Compacting <<"default1/228">> [ns_server:error,2012-11-18T4:22:36.373,ns_1@10.3.121.115:<0.15433.0>:ns_memcached:verify_report_long_call:297]call topkeys took too long: 4081382 us [ns_server:info,2012-11-18T4:22:36.576,ns_1@10.3.121.115:<0.6716.4>:compaction_daemon:spawn_vbucket_compactor:639]Compacting <<"default1/229">> [ns_server:info,2012-11-18T4:22:37.330,ns_1@10.3.121.115:<0.6747.4>:compaction_daemon:spawn_vbucket_compactor:639]Compacting <<"default1/230">> [ns_server:info,2012-11-18T4:22:37.680,ns_1@10.3.121.115:<0.6772.4>:compaction_daemon:spawn_vbucket_compactor:639]Compacting <<"default1/231">> [ns_server:info,2012-11-18T4:22:37.960,ns_1@10.3.121.115:<0.6809.4>:compaction_daemon:spawn_vbucket_compactor:639]Compacting <<"default1/232">> [ns_server:error,2012-11-18T4:22:37.843,ns_1@10.3.121.115:ns_memcached-default1<0.9095.0>:ns_memcached:handle_info:630]handle_info(ensure_bucket,..) took too long: 1210948 us [ns_server:info,2012-11-18T4:22:38.306,ns_1@10.3.121.115:<0.6865.4>:compaction_daemon:spawn_vbucket_compactor:639]Compacting <<"default1/233">> [ns_server:info,2012-11-18T4:22:38.777,ns_1@10.3.121.115:<0.6932.4>:compaction_daemon:spawn_vbucket_compactor:639]Compacting <<"default1/234">> [ns_server:error,2012-11-18T4:22:38.955,ns_1@10.3.121.115:<0.15434.0>:ns_memcached:verify_report_long_call:297]call {stats,<<>>} took too long: 843491 us [ns_server:error,2012-11-18T4:22:39.098,ns_1@10.3.121.115:ns_memcached-standard<0.21850.0>:ns_memcached:handle_info:630]handle_info(ensure_bucket,..) took too long: 1347335 us [ns_server:error,2012-11-18T4:22:39.272,ns_1@10.3.121.115:ns_memcached-sasl<0.15429.0>:ns_memcached:handle_info:630]handle_info(ensure_bucket,..) took too long: 913668 us {cluster_compat_version, [{'_vclock',[{'ns_1@127.0.0.1',{1,63520456425}}]}, 2,0]}, {counters, [{'_vclock', [{'ns_1@10.3.121.115',{1,63520459705}}]}, {rebalance_start,1}]}, {dynamic_config_version, [{'_vclock',[{'ns_1@127.0.0.1',{4,63520456425}}]}, 2,0]}, {email_alerts, [{'_vclock',[{'ns_1@127.0.0.1',{1,63520456424}}]}, {recipients,["root@localhost"]}, {sender,"couchbase@localhost"}, {enabled,false}, {email_server, [{user,[]}, {pass,[]}, {host,"localhost"}, {port,25}, {encrypt,false}]}, {alerts, [auto_failover_node, auto_failover_maximum_reached, auto_failover_other_nodes_down, auto_failover_cluster_too_small,ip,disk, overhead,ep_oom_errors, ep_item_commit_failed]}]}, {fast_warmup, [{fast_warmup_enabled,true}, {min_memory_threshold,10}, {min_items_threshold,10}]}, {index_aware_rebalance_disabled,false}, {max_bucket_count,10}, {memory_quota, [{'_vclock', [{'ns_1@127.0.0.1',{1,63520456426}}]}| 3082]}, {nodes_wanted, [{'_vclock', [{'ns_1@10.3.121.115',{2,63520459703}}]}, 'ns_1@10.3.121.115','ns_1@10.3.121.116']}, {otp, [{'_vclock', [{'ns_1@10.3.121.116',{2,63520459703}}, {'ns_1@127.0.0.1',{1,63520456425}}]}, {cookie,khxnkzsiiskgxwef}]}, {rebalance_status, [{'_vclock', [{'ns_1@10.3.121.115',{2,63520460544}}]}| {none,<<"Rebalance stopped by janitor.">>}]}, {rebalancer_pid, [{'_vclock', [{'ns_1@10.3.121.115',{2,63520460544}}]}| undefined]}, {remote_clusters, [{'_vclock', [{'ns_1@10.3.121.115',{1,63520459931}}]}, [{uuid,<<"274a6cf7338d4686038a03878a2a176d">>}, {name,"http://10.3.121.112:8091/"}, {hostname,"10.3.121.112"}, {username,"Administrator"}, {password,"password"}]]}, {replication,[{enabled,true}]}, {rest,[{port,8091}]}, [ns_server:info,2012-11-18T4:22:23.761,ns_1@10.3.121.115:<0.5617.4>:compaction_daemon:spawn_vbucket_compactor:639]Compacting <<"default1/203">> [ns_server:info,2012-11-18T4:22:24.425,ns_1@10.3.121.115:<0.5681.4>:compaction_daemon:spawn_vbucket_compactor:639]Compacting <<"default1/204">> [user:info,2012-11-18T4:22:24.496,ns_1@10.3.121.115:ns_config<0.331.0>:ns_janitor:maybe_stop_rebalance_status:147]Resetting rebalance status since it's not really running after failure XDCR sasl bucket replication status is Starting Up, and only 871 items were transfered |
| Comments |
| Comment by Steve Yen [ 20/Nov/12 ] |
| moved to 2.0.1 per bug scrub |
| Comment by Karen Zeller [ 05/Dec/12 ] |
|
Added to RN as:
One of the internal maintenance processes within Couchbase Server for checking vBuckets will stop node rebalance when you create replication via XDCR on a source cluster. |
| Comment by Junyi Xie [ 24/Jan/13 ] |
|
Looks to me this is a rebalance issue (rebalance stopped when the system is under some workload) and unrelated to XDCR component. XDCR just exposed the issue.
Remove xdcr from the component and watch this bug. |
| Comment by Aleksey Kondratenko [ 24/Jan/13 ] |
|
Agree with Junyi. It's about timeouts, not directly about xdcr. However XDCR evidently pushes system more heavily into timeoutful state. Let me also note that it's perhaps fixed already by +A and other timeout-fixing commits. |
| Comment by Aleksey Kondratenko [ 24/Jan/13 ] |
| Worth re-testing |
| Comment by Maria McDuff [ 27/Mar/13 ] |
| QE --- pls retest in latest 2.0.2 build. Thanks. |
| Comment by Maria McDuff [ 29/Apr/13 ] |
| andrei, can u pls re-test and update this bug? Thanks. |
| Comment by Andrei Baranouski [ 30/Apr/13 ] |
| rebalance passed with the same steps on 2.0.2-789 |
| Comment by Andrei Baranouski [ 30/Apr/13 ] |
| Maria, can we close it now? |
| Comment by Maria McDuff [ 06/May/13 ] |
| yes, pls close. thanks. |