We are running on couchbase community. Had an indexer issue happen on one of our production nodes the other day. We only noticed it when disk space dropped to 0 many hours later. After increasing disk space, we had to remove the node from the cluster, rebalance, bring the node back in, balance, and recreate all the indexes on that node. However since that time backupmgr has been failing. The error is:
Error backing up cluster: internal server error executing 'GET' request to '/api/v1/bucket/halix2/backup':
{
"code": "error",
"error": "Fail to retrieve local metadata from url 172.31.21.193:9102.",
"result": {}
}
So it appears we haven’t fully recovered from what happened, but are unsure what to try. Any suggestions?
The indexer issue had also happened 5 days earlier, twice in a short time, but recovered on its own. Here is a sampling of that from the error log:
ns_server:error,2023-09-06T19:24:31.082Z,ns_1@172.31.21.193:service_agent-index<0.12227.0>:service_agent:handle_info:285]Lost json rpc connection for service index, reason shutdown. Terminating.
[ns_server:error,2023-09-06T19:24:31.082Z,ns_1@172.31.21.193:service_agent-index<0.12227.0>:service_agent:terminate:306]Terminating abnormally
[ns_server:error,2023-09-06T22:22:50.997Z,ns_1@172.31.21.193:service_agent-index<0.9069.333>:service_agent:handle_info:285]Lost json rpc connection for service index, reason shutdown. Terminating.
[ns_server:error,2023-09-06T22:22:50.997Z,ns_1@172.31.21.193:service_agent-index<0.9069.333>:service_agent:terminate:306]Terminating abnormally
[ns_server:error,2023-09-11T22:03:51.512Z,ns_1@172.31.21.193:service_agent-index<0.15039.356>:service_agent:handle_info:285]Lost json rpc connection for service index, reason shutdown. Terminating.
[ns_server:error,2023-09-11T22:03:51.512Z,ns_1@172.31.21.193:service_agent-index<0.15039.356>:service_agent:terminate:306]Terminating abnormally
[ns_server:error,2023-09-11T22:03:51.512Z,ns_1@172.31.21.193:service_status_keeper_worker<0.1079.0>:rest_utils:get_json:57]Request to (indexer) getIndexStatus with headers [{"If-None-Match", "21058684869cb89c"}] failed: {error,{econnrefused, [{lhttpc_client,send_request,1,[{file, "/home/couchbase/jenkins/workspace/couchbase-server-unix/couchdb/src/lhttpc/lhttpc_client.erl"}, {line, 220}]},{lhttpc_client,execute,9,[{file,"/home/couchbase/jenkins/workspace/couchbase-server-unix/couchdb/src/lhttpc/lhttpc_client.erl"},{line, 169}]}, {lhttpc_client, request,9,[{file,"/home/couchbase/jenkins/workspace/couchbase-server-unix/couchdb/src/lhttpc/lhttpc_client.erl"},{line,93}]}]}}
The first sign of running out of disk space was:
user:error,2023-09-12T15:26:08.155Z,ns_1@172.31.21.193:<0.764.1433>:compaction_daemon:ensure_can_db_compact:1028]Cannot compact database `halixpos/4`: the estimated necessary disk space is about 813
7554 bytes but the currently available disk space is 0 bytes.
So the indexer started having issues and then was out of disk space 17 hours later. The server had 60GB and only using 40% of it at the time. And after our recovery efforts to get it up and running again, disk space dropped back to that level again without us deleting anything.
We need to get the backups functioning again. Any debugging ideas would be greatly appreciated. There’s almost no mention of “Fail to retrieve local metadata” in the web in relation to cbbackupmgr.
Thank you,
~Brandon Williams