A node of cluster does not respond anymore after >1h of workload

Hi all,

I have a cluster of 3 nodes with version Community 4.5.1. I use cbworkloadgen to test my cluster. I constate that after about 1 hour of running, the node CB1 specified in the cbworkloadgen command does not respond anymore and cbworkloadgen exits. Other nodes of cluster (CB2 & CB3) are still accessible. After about several minutes, CB1 is accessible again. In the web console of CB2, the CB1 is still green during its “down” period… It is liked that CB1 is flooded or busy to do something else and cannot respond anymore after 1h of intensive load…

Anyone can give an explication please ? Is there any tweak to do ?

Best regards,

NT

Hi,

Can you share how the vitals of your servers look like during the load test? Also, what are the specs if the servers etc. We’ll need more information to be of any help.

Regards,
Abbas

Hi,

What do you means vitals of my servers during load test ? I attach some screenshots of the bucket. All 3 servers are 4 CPU, 32GB RAM, 500GB data disk. I also did the same test on another server (standalone) with 16CPU, 128GB RAM and 500GB data disk and the result is the same. After running for a while, the server does not respond anymore during several minutes…

for the test, I use the following cbworkloadgen command:

cbworkloadgen -n serverip:8091 -r .1 --max-items=500000 -s 100 -t 4 -l

Thanks for your helps.

NT

By the way, I have enabled auto failover with 60s but, but the auto failover is not detected and triggered !

I’d recommend using pillowfight from the C SDK tools if you want to do any serious performance testing.

It’s capable of much higher throughput, and has many more options for configuring your workload - number of threads, size / distribution of accesses, etc.