View response blocks every 120seconds

misttar · January 15, 2015, 2:33am

Like clockwork, every 120 seconds our view performance tanks.

If I run a quick loop to send view requests over and over, it will return with 40ms to 150ms response times. Then for 10s (say from 1:02:12pm to 1:02:24pm). It will stop responding, and then return all the calls, with times taking 10000ms or more to respond.

I first noticed this while using the couchbase-client for java, but I went ahead and just wrote a quick ruby script to hit the rest-api for the view directly, using the same queries etc.

When the latency happens, the disk io climbs a bit, and the beam.smp process maxes out a core of the VM.

I have tried searching through the logs and don’t see anything that happens every 120s that is obvious to correlate to this.

Where could I look to find whatever is happening every 120s that is causing the CPU spike and most likely the view slowness?

Update:
Looks like this might be related to the stats_archiver stuff, per: Connection timeouts during statistics

There isn’t a response on that thread on how to diagnose and/or disable/tune the stats_archiver. But it sounds like something is making the server be “underpowered” even though there is barely any traffic.

cihangirb · January 15, 2015, 5:55am

Hi @misttar, a few questions for you;

version of couchbase server
HW config and cluster topology
data size, item count and view details (#ddocs and views)
workload details - mutations vs reads vs queries /sec

thanks
-cihan

misttar · January 15, 2015, 6:00pm

Couch version: 2.5.1 enterprise edition (build-1083)
3 nodes, Amazon EC2 m3.medium, CentOS
Data size: ~50Megs of data, ~10k documents, spread across 3 data buckets.
~12 views, 4 design docs between the 3 data buckets
Workload: less then 5 reads per second, writes are even lower, a few an hour.

As you can see our workload is almost non-existent right now. And the slow down doesn’t correlate with any external access to the server (there is no spike in reads/writes, etc).

We expect the workload to increase 100x from what it is as soon as we resolve this issue.

But we can’t do that while this behavior is happening.

cihangirb · January 28, 2015, 4:51pm

apologies for the delayed response; can you also share the query parameters you are passing to 8092? is it a query on a specific view that always delays or randome query on any view will experience the delay?

misttar · January 28, 2015, 5:44pm

So we figured out what was causing it.

The stats_archiver runs every 120s to store historical information about the couchbase nodes/servers/buckets/etc. The load that the stats_archiver generated was enough to max out the single vCPU that we had allocated per node in EC2 (m3.medium).

We resized our nodes to c3.xlarge (4 vCPU) and the extra vCPU allows the load to be better distributed and not cause delays and long request times (our average response time is now 10-20ms).

cihangirb · January 28, 2015, 5:58pm

Thanks @misttar, sorry we were not fast enough.
-cihan

Topic		Replies	Views
Querying views cause 100% cpu for hours Couchbase Server	2	1894	June 25, 2016
MapReduce-View performance very slow Couchbase Server	4	2939	January 15, 2016
Why is my view performance so inconsistent? Couchbase Server	6	2132	July 19, 2013
High CPU & RAM usage on data node Couchbase Server	4	3262	May 24, 2016
Idle couchbase eating CPU (couchbase 6.5.0, or bleeding edge git) Couchbase Server	1	1013	October 22, 2020

View response blocks every 120seconds

Related topics