How Cluster Map Works Internally In case of Failure Node

Debasis_Mallick · February 27, 2024, 1:14pm

Hi Team,

Before CB 7.1 if a Index+QUERY node gets failed due to any infra issue, then it will not automatic failover.
Then how the cluster map get updated regarding this failed node.
Basically just want to understand in case of node failure, how cluster map gets update and what are information cluster map stored regarding the nodes.

Thanks,
Debasis

mreiche · February 27, 2024, 9:04pm

shivansh_rustagi · February 28, 2024, 8:59am

Hi,
could you clarify a bit more on what you mean by “failed due to any infra issue, then it will not automatic failover”

In general for any failed node in the cluster:
If you are gathering the cluster information from the /pools/default endpoint (ref: Viewing Cluster Details | Couchbase Docs), any failed over node will have its membership updated to “inactiveFailed”.

Example response:

....
"nodes": [
        {
            ...
            "clusterMembership": "inactiveFailed",
....

Debasis_Mallick · February 29, 2024, 6:24am

Before CB 7.1 Index node automatic node failover is not supported.
So In that case CB 6.6 if my index node get failed due to some infra issue, how the cluster map gets update in case of this kind of situation.

Thanks,
Debasis

mreiche · February 29, 2024, 7:52pm

Cluster Management documentation for 6.x is here Manage Nodes and Clusters | Couchbase Docs Archive

Debasis_Mallick · March 1, 2024, 12:43pm

Thanks @mreiche . I had gone through the documentation and what I understood ns_server is the master process in the cluster. When I am perfroming a node (Index+Query) failure , I do not see any message capturing in ns_couchdb.log regarding failure information of that node. Is this the correct log we need to check when a node failed or any other logs.
I just want to see when a node getting failed how the cluster manager getting aware of this and cluster map getting adjusted accordingly.

Thanks,
Debasis

mreiche · March 1, 2024, 5:15pm

Check the cluster map to see when the cluster map is changed.

Debasis_Mallick · March 6, 2024, 10:38am

Could you please let me know which log or any command to check the clustermap deatils.
I tried searching in google but no help. I checked in ns_couchdb.log but there also no clue.

Thanks,
Debasis

mreiche · March 6, 2024, 6:56pm

cluster map is available at http://hostname:8091/pools/default

It’s possible that cluster map changes are logged, but I don’t know which file they would be in. Maybe couchdb.log or memcached.log. The way I find things in the log files is

grep -i -l “guess-at-string” *

And then look in the files that have hits. For cluster map - something like

grep -i -l “clustermap” *

That didn’t turn up anything, so keep trying.

btw - The SDKs get configurations from the cluster and use it accordingly - so you don’t have to.

Debasis_Mallick · March 11, 2024, 6:15am

Hi @mreiche I tried searching in the all log files but no luck . The above curl command while run status showing “unhealthy” for the failed Index+Query node but if cbq already connected to the IP of failed node before node failure, in that case we can run query w/o any issue.
So I am trying to understand that why cluster map not triggered the error even if one Index+Query node failure.

Thanks,
Debasis

pccb · March 12, 2024, 7:52am

@shivansh_rustagi , “failed due to any infra issue” means the node (VM) running the index service goes down.

As we’re aware, index service autofailover isn’t supported in version 6.6. In the event of a failure, the failure itself will be the initial change, and failover (if supported) would represent the subsequent change. For now, keeping failover aside, we’re investigating whether failures will be communicated to clients through an updated cluster map. If yes, subsequent N1QLs should not fail because the cluster map will be updated, ensuring that requests aren’t routed to the failed node.

Thanks

mreiche · March 12, 2024, 3:02pm

Be aware that index and query are separate services.

system · June 10, 2024, 3:02pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Node and cluster failover Java SDK	3	1027	November 3, 2019
Auto rebalance after node failure Couchbase Server	11	5579	May 17, 2017
How to handle node failure in the cluster .NET SDK connections , dot-net	1	877	June 27, 2022
Couchbase-server failover removes node from cluster Couchbase Server server , couchbase-cli	8	867	September 28, 2023
Couchbase query fails when one data node goes down Couchbase Server	2	1169	August 31, 2021

How Cluster Map Works Internally In case of Failure Node

Related topics