Data nodes on the cluster needed frequent rebalance

nandu · October 31, 2023, 1:14pm

We have set up a cluster with the following configuration:

Data Nodes: 3, with a bucket replica configuration of 2.
Other Nodes: 2 nodes that manage query, index, and eventing.

Although we have dedicated query nodes, our primary reliance is on kvops (Document and SubDocument operations) for data operations. According to Couchbase documentation, this method is optimal for faster operations. However, we’ve observed that the memory utilization on the node often spikes to 85%, leading to a graceful failover. To restore the cluster to its normal state, we typically need to perform a rebalance.

I have a couple of questions:

Is there a configuration setting that can automatically manage and clear the memory to maintain a healthy data node?
Can the cluster remain operational during rebalancing, even though it sometimes results in failures?

mreiche · October 31, 2023, 4:55pm

It’s probably best to open an issue with support to identify what’s going on there and get it resolved.

Is there a configuration setting that can automatically manage and clear the memory to maintain a healthy data node?

I don’t know of any. That would likely depend on what is causing the memory spike.

Can the cluster remain operational during rebalancing, even though it sometimes results in failures?

It should remain operational. It may be necessary for the SDK to retry operations (for instance, if the active document is on a different node than expected). Or when the node is removed from the cluster. But the error should be transient and the SDK should be able to recover on it’s own by retrying.

nandu · November 1, 2023, 12:53am

We’ve observed that after removing a node all applications continue to run however after a rebalance is done the application nodes start throwing unambiguous timeouts and we have to restart the application service in order for services to be fully restored. Is there a parameter we need to set in the client sdk (we are using Python 4.x) that will automatically recover the client connection after a rebalance?

mreiche · November 1, 2023, 2:14am

I found an open issue PYCBC-1523 that sounds like it could be related to what you describe. If you need more assistance, please open a support case.

mreiche · November 1, 2023, 4:41pm

It sounds like this is the issue you should address first.

system · January 30, 2024, 4:41pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Replace a node, and rebalance Couchbase Server	9	3509	April 11, 2017
Memory consumption increased significantly after rebalance Couchbase Server	22	4787	December 4, 2017
Couchbase Rebalance large Cluster Couchbase Server	10	2024	August 11, 2020
Manually failed-over node removed from Cluster Couchbase Server	7	526	October 5, 2023
Auto rebalance after node failure Couchbase Server	11	5634	May 17, 2017

Data nodes on the cluster needed frequent rebalance

Related topics