Unable to disable autofailover

jbarton · January 19, 2017, 10:34am

Hi,

After a recent outage caused by the failover of a single node, we’re looking to temporarily disable the auto-failover option.
However, whichever method I use (curl or couchbase-cli), across any of the nodes, I’m unable to disable (or even amend the timeout) of the auto-failover setting.

curl:

curl -v -u admin:password http://localhost:8091/settings/autoFailover -d ‘enabled=false’

About to connect() to localhost port 8091 (#0)

Trying 127.0.0.1… connected

Connected to localhost (127.0.0.1) port 8091 (#0)

Server auth using Basic with user ‘admin’

POST /settings/autoFailover HTTP/1.1
Authorization: Basic abc123
User-Agent: curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.14.3.0 zlib/1.2.3 libidn/1.18 libssh2/1.4.2
Host: localhost:8091
Accept: /
Content-Length: 13
Content-Type: application/x-www-form-urlencoded

< HTTP/1.1 500 Internal Server Error
< Server: Couchbase Server
< Pragma: no-cache
< Date: Thu, 19 Jan 2017 10:26:20 GMT
< Content-Type: application/json
< Content-Length: 44
< Cache-Control: no-cache
<

Connection #0 to host localhost left intact

Closing connection #0
[“Unexpected server error, request logged.”]

couchbase-cli:

ERROR: unable to set auto failover settings (500) Internal Server Error
[u’Unexpected server error, request logged.']

Results in the following message in the error log:

[ns_server:error,2017-01-19T10:15:22.675,ns_1@node1.cbcluster.com:<0.31120.483>:menelaus_web:loop:170]Server error during processing: [“web request failed”,
{path,“/settings/autoFailover”},
{type,exit},
{what,
{noproc,
{gen_server,call,
[{global,auto_failover},
disable_auto_failover]}}},
{trace,
[{gen_server,call,2,
[{file,“gen_server.erl”},{line,180}]},
{menelaus_web,
handle_settings_auto_failover_post,1,
[{file,“src/menelaus_web.erl”},
{line,1870}]},
{request_throttler,do_request,3,
[{file,“src/request_throttler.erl”},
{line,59}]},
{menelaus_web,loop,2,
[{file,“src/menelaus_web.erl”},
{line,149}]},
{mochiweb_http,headers,5,
[{file,
“/home/buildbot/buildbot_slave/centos-6-x64-301-builder/build/build/couchdb/src/mochiweb/mochiweb_http.erl”},
{line,94}]},
{proc_lib,init_p_do_apply,3,
[{file,“proc_lib.erl”},{line,239}]}]}]

Any suggestions as to the cause (or even better, a fix) for the above issue? Happy to provide further logs if required.

Cluster
5 server nodes
5 buckets (single replica)
~100 opsecs on 3 of these buckets
Version: 3.0.1 Community Edition (build-1444)

Development plans are undergoing to upgrade to v4.5.

jbarton · January 26, 2017, 10:55am

Is anyone able to help with this? We’re still unable to disable autofailover.

drigby · January 26, 2017, 11:15am

I’m not sure what’s going on - seems like the REST API isn’t very happy.

I assume you’ve tried the UI to change this?

jbarton · January 26, 2017, 11:27am

Hi drigby,
Thanks for the reply. Yes, I’ve tried the UI on each of the nodes. I receive a standard error:

[ns_server:error,2017-01-26T11:26:08.483,ns_1@nod4.mydomain.com:<0.150.3009>:menelaus_web:loop:170]Server error during processing: [“web request failed”,
{path,“/settings/autoFailover”},
{type,exit},
{what,
{noproc,
{gen_server,call,
[{global,auto_failover},
disable_auto_failover]}}},
{trace,
[{gen_server,call,2,
[{file,“gen_server.erl”},{line,180}]},
{menelaus_web,
handle_settings_auto_failover_post,1,
[{file,“src/menelaus_web.erl”},
{line,1870}]},
{request_throttler,do_request,3,
[{file,“src/request_throttler.erl”},
{line,59}]},
{menelaus_web,loop,2,
[{file,“src/menelaus_web.erl”},
{line,149}]},
{mochiweb_http,headers,5,
[{file,
“/home/buildbot/buildbot_slave/centos-6-x64-301-builder/build/build/couchdb/src/mochiweb/mochiweb_http.erl”},
{line,94}]},
{proc_lib,init_p_do_apply,3,
[{file,“proc_lib.erl”},{line,239}]}]}]

drigby · January 26, 2017, 12:01pm

Hmm. Is there anything in the other log files relating to auto_failover? It does sound like something in the cluster manager (aka ns_server) isn’t happy and cannot accept the setting change. Are all your nodes up?

jbarton · January 26, 2017, 12:20pm

The only reference to “auto_failover” is in the error, debug and info.log files, all of which reference the errors above (or my attempts to reset the count).

All the nodes are up (a rebalance is currently required, but only from today).

ianmccloy · February 27, 2017, 7:16pm

jbarton It’s possible this is related to a bug in Erlang, see https://issues.couchbase.com/browse/MB-7282

Workaround it to run the following -
wget --user=Administrator --password=asdasd --post-data=‘rpc:call(mb_master:master_node(), erlang, apply ,[fun () -> erlang:exit(erlang:whereis(mb_master), kill) end, []]).’ http://localhost:8091/diag/eval

Topic		Replies	Views
Error when enabling / disabling auto failover Couchbase Server	1	1262	February 27, 2017
AutoFailover Error Couchbase Server	0	2092	June 10, 2014
Auto failover on 1 of 4 nodes in a cluster - weird behaviour Couchbase Server	4	1497	May 25, 2017
Stop server didn't incur to failover Couchbase Server	0	1621	March 27, 2014
Auto-Failover not working with 5.1.1? Couchbase Server	3	1262	May 13, 2019

Unable to disable autofailover

Related topics