How Cluster Map Works Internally In case of Failure Node

Hi Team,

Before CB 7.1 if a Index+QUERY node gets failed due to any infra issue, then it will not automatic failover.
Then how the cluster map get updated regarding this failed node.
Basically just want to understand in case of node failure, how cluster map gets update and what are information cluster map stored regarding the nodes.


could you clarify a bit more on what you mean by “failed due to any infra issue, then it will not automatic failover”

In general for any failed node in the cluster:
If you are gathering the cluster information from the /pools/default endpoint (ref: Viewing Cluster Details | Couchbase Docs), any failed over node will have its membership updated to “inactiveFailed”.

Example response:

"nodes": [
            "clusterMembership": "inactiveFailed",

Before CB 7.1 Index node automatic node failover is not supported.
So In that case CB 6.6 if my index node get failed due to some infra issue, how the cluster map gets update in case of this kind of situation.


Cluster Management documentation for 6.x is here Manage Nodes and Clusters | Couchbase Docs Archive

Thanks @mreiche . I had gone through the documentation and what I understood ns_server is the master process in the cluster. When I am perfroming a node (Index+Query) failure , I do not see any message capturing in ns_couchdb.log regarding failure information of that node. Is this the correct log we need to check when a node failed or any other logs.
I just want to see when a node getting failed how the cluster manager getting aware of this and cluster map getting adjusted accordingly.


Check the cluster map to see when the cluster map is changed.

Could you please let me know which log or any command to check the clustermap deatils.
I tried searching in google but no help. I checked in ns_couchdb.log but there also no clue.


cluster map is available at http://hostname:8091/pools/default

It’s possible that cluster map changes are logged, but I don’t know which file they would be in. Maybe couchdb.log or memcached.log. The way I find things in the log files is

grep -i -l “guess-at-string” *

And then look in the files that have hits. For cluster map - something like

grep -i -l “clustermap” *

That didn’t turn up anything, so keep trying.

btw - The SDKs get configurations from the cluster and use it accordingly - so you don’t have to.

Hi @mreiche I tried searching in the all log files but no luck . The above curl command while run status showing “unhealthy” for the failed Index+Query node but if cbq already connected to the IP of failed node before node failure, in that case we can run query w/o any issue.
So I am trying to understand that why cluster map not triggered the error even if one Index+Query node failure.


@shivansh_rustagi , “failed due to any infra issue” means the node (VM) running the index service goes down.

As we’re aware, index service autofailover isn’t supported in version 6.6. In the event of a failure, the failure itself will be the initial change, and failover (if supported) would represent the subsequent change. For now, keeping failover aside, we’re investigating whether failures will be communicated to clients through an updated cluster map. If yes, subsequent N1QLs should not fail because the cluster map will be updated, ensuring that requests aren’t routed to the failed node.


Be aware that index and query are separate services.