A node of cluster does not respond anymore after >1h of workload

uthng · August 14, 2017, 1:31pm

Hi all,

I have a cluster of 3 nodes with version Community 4.5.1. I use cbworkloadgen to test my cluster. I constate that after about 1 hour of running, the node CB1 specified in the cbworkloadgen command does not respond anymore and cbworkloadgen exits. Other nodes of cluster (CB2 & CB3) are still accessible. After about several minutes, CB1 is accessible again. In the web console of CB2, the CB1 is still green during its “down” period… It is liked that CB1 is flooded or busy to do something else and cannot respond anymore after 1h of intensive load…

Anyone can give an explication please ? Is there any tweak to do ?

Best regards,

NT

solutionguy · August 14, 2017, 5:33pm

Hi,

Can you share how the vitals of your servers look like during the load test? Also, what are the specs if the servers etc. We’ll need more information to be of any help.

Regards,
Abbas

uthng · August 14, 2017, 7:31pm

Hi,

What do you means vitals of my servers during load test ? I attach some screenshots of the bucket. All 3 servers are 4 CPU, 32GB RAM, 500GB data disk. I also did the same test on another server (standalone) with 16CPU, 128GB RAM and 500GB data disk and the result is the same. After running for a while, the server does not respond anymore during several minutes…

uthng · August 14, 2017, 7:41pm

for the test, I use the following cbworkloadgen command:

cbworkloadgen -n serverip:8091 -r .1 --max-items=500000 -s 100 -t 4 -l

Thanks for your helps.

NT

uthng · August 14, 2017, 9:55pm

By the way, I have enabled auto failover with 60s but, but the auto failover is not detected and triggered !

drigby · August 15, 2017, 9:43am

I’d recommend using pillowfight from the C SDK tools if you want to do any serious performance testing.

It’s capable of much higher throughput, and has many more options for configuring your workload - number of threads, size / distribution of accesses, etc.

Topic		Replies	Views
Slower performance when a note is down in a cluster Couchbase Server	3	1861	March 29, 2014
Couchbase 3.0.1 auto failover every week or so Couchbase Server	6	2935	January 27, 2016
Couchbase cluster reports frequent nodes down/up Couchbase Server	2	2884	February 28, 2017
Confusing Cluster Behaviour / Load Balancing 100% CPU, 40% CPU, 40% CPU Couchbase Server node	3	455	March 13, 2024
Last node in a cluster not responding Couchbase Server	0	679	April 8, 2020

A node of cluster does not respond anymore after >1h of workload

Related topics