[MB-5279] user is unable to add the node back to the cluster because the failed over node is stuck in warmup phase Created: 10/May/12  Updated: 09/Jan/13  Resolved: 29/May/12

Status: Closed
Project: Couchbase Server
Component/s: ns_server
Affects Version/s: 1.8.1-release-candidate
Fix Version/s: 1.8.1
Security Level: Public

Type: Bug Priority: Blocker
Reporter: Karan Kumar (Inactive) Assignee: Aleksey Kondratenko
Resolution: Fixed Votes: 0
Labels: customer
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: CentOS 64 bit

1.8.1-807

Attachments: GZip Archive 1452ec82-fb21-4458-b917-4527dac36f53-10.1.3.74-diag (2).gz     GZip Archive 1452ec82-fb21-4458-b917-4527dac36f53-10.1.3.75-diag.gz     GZip Archive 1452ec82-fb21-4458-b917-4527dac36f53-10.1.3.76-diag.gz     GZip Archive 1452ec82-fb21-4458-b917-4527dac36f53-10.1.3.77-diag.gz     GZip Archive 1452ec82-fb21-4458-b917-4527dac36f53-10.1.3.78-diag.gz     GZip Archive 1452ec82-fb21-4458-b917-4527dac36f53-10.1.3.79-diag.gz     GZip Archive 1452ec82-fb21-4458-b917-4527dac36f53-10.1.3.80-diag.gz    

 Description   
Have not seen this issue previously ever on 181. Seems to be something new.

Test case that failed
swaprebalance.SwapRebalanceFailedTests.test_add_back_failed_node,replica=3,num-buckets=5,num-swap=2,swap-orchestrator=True,keys-count=800000

1) Failover 2 nodes, including the orchestrator
2) Rebalance them out
3) Add them back and rebalance in, without cleaning them.

[rebalance:error] [2012-05-10 8:48:59] [ns_1@10.1.3.74:<0.27930.16>:ns_rebalancer:do_wait_buckets_shutdown:170] Failed to wait deletion of some buckets on some nodes: [{'ns_1@10.1.3.77',
                                                         {'EXIT',
                                                          {old_buckets_shutdown_wait_failed,
                                                           ["bucket-2",
                                                            "bucket-1",
                                                            "bucket-0",
                                                            "bucket-4",
                                                            "bucket-3"]}}}]

[user:info] [2012-05-10 8:48:59] [ns_1@10.1.3.74:<0.27893.16>:ns_orchestrator:handle_info:246] Rebalance exited with reason {buckets_shutdown_wait_failed,
                              [{'ns_1@10.1.3.77',
                                {'EXIT',
                                 {old_buckets_shutdown_wait_failed,
                                  ["bucket-2","bucket-1","bucket-0",
                                   "bucket-4","bucket-3"]}}}]}

 Comments   
Comment by Aleksey Kondratenko [ 10/May/12 ]
Found that indeed shutting down bucket-2 took minutes even though we (according to logs) passed force flag:


[user:info] [2012-05-10 8:45:32] [ns_1@10.1.3.77:'ns_memcached-bucket-0':ns_memcached:terminate:296] Shutting down bucket "bucket-0" on 'ns_1@10.1.3.77' for deletion
... (no other messages from this erlang process in between) ...
[ns_server:debug] [2012-05-10 8:49:38] [ns_1@10.1.3.77:'ns_memcached-bucket-0':ns_storage_conf:delete_bucket_db_directory:384] Result of deleting db directory: "/opt/couchbase/var/lib/couchbase/data/bucket-0-data": ok

That's what rebalance failure actually mentioned.

I've seen some memcached crash on this .77 node shortly before we've started bucket deletion. So my guess is that warmup prevents force shutdown to be quick. I.e.:


[error_logger:info] [2012-05-10 8:45:04] [ns_1@10.1.3.77:error_logger:ale_error_logger_handler:log_report:72]
=========================PROGRESS REPORT=========================
          supervisor: {local,ns_port_sup}
             started: [{pid,<0.29206.11>},
                       {name,
                           {memcached,"/opt/couchbase/bin/memcached",
                               ["-X",
                                "/opt/couchbase/lib/memcached/stdin_term_handler.so",
                                "-l","0.0.0.0:11210,0.0.0.0:11209:1000","-E",
                                "/opt/couchbase/lib/memcached/bucket_engine.so",
                                "-B","binary","-r","-c","10000","-e",
                                "admin=_admin;default_bucket_name=default;auto_create=false",
                                []],
                               [{env,
                                    [{"EVENT_NOSELECT","1"},
                                     {"MEMCACHED_TOP_KEYS","100"},
                                     {"ISASL_PWFILE",
                                      "/opt/couchbase/var/lib/couchbase/data/isasl.pw"},
                                     {"ISASL_DB_CHECK_TIME","1"}]},
                                use_stdio,stderr_to_stdout,exit_status,
                                port_server_send_eol,stream]}},
                       {mfargs,
                           {supervisor_cushion,start_link,
                               [memcached,5000,ns_port_server,start_link,
                                [memcached,"/opt/couchbase/bin/memcached",
                                 ["-X",
                                  "/opt/couchbase/lib/memcached/stdin_term_handler.so",
                                  "-l","0.0.0.0:11210,0.0.0.0:11209:1000",
                                  "-E",
                                  "/opt/couchbase/lib/memcached/bucket_engine.so",
                                  "-B","binary","-r","-c","10000","-e",
                                  "admin=_admin;default_bucket_name=default;auto_create=false",
                                  []],
                                 [{env,
                                      [{"EVENT_NOSELECT","1"},
                                       {"MEMCACHED_TOP_KEYS","100"},
                                       {"ISASL_PWFILE",
                                        "/opt/couchbase/var/lib/couchbase/data/isasl.pw"},
                                       {"ISASL_DB_CHECK_TIME","1"}]},
                                  use_stdio,stderr_to_stdout,exit_status,
                                  port_server_send_eol,stream]]]}},
                       {restart_type,permanent},
                       {shutdown,10000},
                       {child_type,worker}]

and prior to that:


[error_logger:error] [2012-05-10 8:45:03] [ns_1@10.1.3.77:error_logger:ale_error_logger_handler:log_report:72]
=========================CRASH REPORT=========================
  crasher:
    initial call: ns_port_server:init/1
    pid: <0.15086.9>
    registered_name: []
    exception exit: {abnormal,134}
      in function gen_server:terminate/6
    ancestors: [<0.15085.9>,ns_port_sup,ns_server_sup,ns_server_cluster_sup,
                  <0.60.0>]
    messages: [{'EXIT',#Port<0.174933>,normal}]
    links: [<0.15085.9>]
    dictionary: []
    trap_exit: true
    status: running
    heap_size: 121393
    stack_size: 24
    reductions: 1661845
  neighbours:


and even more prior to that:


[ns_server:info] [2012-05-10 8:44:34] [ns_1@10.1.3.77:<0.15086.9>:ns_port_server:log:161] memcached<0.15086.9>: [warn] Epoll ADD(5) on fd 247 failed. Old events were 0; read change was 1 (add); write change was 1 (add): Bad file descriptor
memcached<0.15086.9>: Failed to add connection to libevent: Bad file descriptor
memcached<0.15086.9>: memcached: bucket_engine.c:997: handle_disconnect: Assertion `es' failed.
Comment by Aleksey Kondratenko [ 10/May/12 ]
Given all that and especially that most likely this is possibly not a regression (i.e. warmup likely caused bucket deletion to be delayed on 1.8.0 as well) assigning to Frank so that we consider if this is really blocker.
Comment by Dipti Borkar [ 15/May/12 ]
From Farshid: doesn't seem like a regression and is a common use case that we should fix in 1.8.1. we need a fix for this.
Comment by Dipti Borkar [ 15/May/12 ]
Perhaps the behavior here needs to be changed? don't allow "rebalance" to start if a node is warming up ? Is there something else going on under the covers?
Comment by Aleksey Kondratenko [ 15/May/12 ]
Something else.

We failed over node.

This means we requested deletion of bucket on this node. This deletion of bucket may take some time and we normally don't care. But we don't expect it to take ages.

What happens next is node is attempted to be added back while this deletion is still in progress and we refuse because we're going to re-create this bucket. But we need to complete deletion of it first.

This is something we long wanted from ep-engine. We want bucket deletion to be reasonably quick. Always.

If you still think we need this logic "don't start rebalance unless you can be reasonably sure it will succeed" we can work on it. It doesn't look like very trivial task though.
Comment by Aleksey Kondratenko [ 15/May/12 ]
Correction "dont allow user to start rebalance unless you can reasonably sure it will succeed". Let me note that current behavior of performing all checks at the start of rebalance (after user has requested rebalance) is not very different. The difference is some potential confusion on behalf of user if rebalance fails immediately.
Comment by Dipti Borkar [ 15/May/12 ]
Is it easy to make it more obvious then as to why rebalance failed?
Comment by Aleksey Kondratenko [ 15/May/12 ]
Isn't the following good enough?:

Rebalance exited with reason {buckets_shutdown_wait_failed,
                              [{'ns_1@10.1.3.77',
                                {'EXIT',
                                 {old_buckets_shutdown_wait_failed,
                                  ["bucket-2","bucket-1","bucket-0",
                                   "bucket-4","bucket-3"]}}}]}
Comment by Dipti Borkar [ 15/May/12 ]
Thanks Aliaksey. I understand this issue better now. Let's see what can be done in ep_engine.
Comment by Farshid Ghods (Inactive) [ 15/May/12 ]
ns-server should send bucket-deletion request to the memcached which is waming up and ep-engine should abort warmup process and continue with bucket deletion
Comment by Dipti Borkar [ 17/May/12 ]
Chiyoung, can you estimate the level of change needed to support this? How risky are these changes?
Comment by Aleksey Kondratenko [ 17/May/12 ]
BTW, do we have reliable evidence that bucket deletion was delayed because of warmup in this particular case? I.e. I know warmup is _likely_ in progress but that's all I know about this bucket deletion.
Comment by Farshid Ghods (Inactive) [ 17/May/12 ]
yes we have confirmed that warmup process was going on and bucket deletion causes the whole thing to hang.

if it helps we can get gdb backtrace when this is happening but given that this was seen many times by QE/customers i think we can safely assume thats the case.
Comment by Chiyoung Seo [ 25/May/12 ]
Can the ns_server move the database files or delete them directly before the failedover node is added back?
Comment by Aleksey Kondratenko [ 25/May/12 ]
we do delete files. But after bucket shutdown is complete
Comment by Aleksey Kondratenko [ 25/May/12 ]
we have patch for 1.8 that will make memcached exit immediately without shutting down buckets when failover happens. That could be easy treatment for this problem
Comment by Chiyoung Seo [ 25/May/12 ]
I'm a little confused now. Why the failed over node is warmed up again?

Anyway, if the ns-server already has the patch for this issue, can I close this bug?
Comment by Aleksey Kondratenko [ 25/May/12 ]
memcached crashed before node was failed over. Then ns_server restarted memcached and recreated buckets. And they started warming up.

Then node was failed over. ns_server deletes all buckets (with force flag set to true) but that takes long time seemingly because of warm up. And that prevents rebalance back in. Because before starting rebalance we now wait for pending bucket deletions to complete.

The patch I mentioned is against memcached. The idea is when we detect failover we ask memcached to just exit without any cleanup.

I don't know why but Farshid and Dipti wanted ep-engine to be fixed. Given 2.0 and 1.8.1 warmup is very different I question whether we really need that btw.
Comment by Farshid Ghods (Inactive) [ 25/May/12 ]
>>The idea is when we detect failover we ask memcached to just exit without any cleanup.

is this like kill -9 ?
Comment by Aleksey Kondratenko [ 25/May/12 ]
yes. But works on windows too
Comment by Farshid Ghods (Inactive) [ 25/May/12 ]
ok so if this is a force kill then we dont have to fix this on ep-engine.

for this specific issue all we care about it the end user experience and if user will be able to add the node back to the cluster without any issues and your patch takes care of it so we can close this bug
Comment by Aleksey Kondratenko [ 25/May/12 ]
Patches backported and uploaded to gerrit. I can rubberstamp ns_server's patches. But not memcached.
Comment by Aleksey Kondratenko [ 25/May/12 ]
http://review.couchbase.org/16432
http://review.couchbase.org/16436
http://review.couchbase.org/16437
http://review.couchbase.org/16438
Comment by Aleksey Kondratenko [ 29/May/12 ]
Should be fixed now with memcached & ns_server side merged
Comment by Thuan Nguyen [ 29/May/12 ]
Integrated in github-ns-server-2-0 #357 (See [http://qa.hq.northscale.net/job/github-ns-server-2-0/357/])
    MB-5279:bp: allow port_servers to send arbitrary data to port (Revision ab2b8ecf77658fda7266a9eaed1b474f29741e43)
MB-5279:bp:made supervisor_cushion exit with same reason as child (Revision 5ba3d6e1634c613ff3b6c1678f3eb04449434834)
MB-5279:bp: ask memcached to die real quick on failover (Revision 2f14d3a4107b6c7fccc4c479832048bb259f469c)
MB-5279: send shutdown command instead of empty line to memcached (Revision 6118863758b88da374868d2adf7a890096067394)

     Result = SUCCESS
Aliaksey Kandratsenka :
Files :
* src/ns_port_server.erl

Aliaksey Kandratsenka :
Files :
* src/supervisor_cushion.erl

Aliaksey Kandratsenka :
Files :
* src/ns_port_sup.erl
* src/ns_server_sup.erl

Aliaksey Kandratsenka :
Files :
* src/ns_port_server.erl
Generated at Mon Sep 22 04:24:56 CDT 2014 using JIRA 5.2.4#845-sha1:c9f4cc41abe72fb236945343a1f485c2c844dac9.