Disk Write Queue - no items removed?

techilla · February 19, 2016, 8:41am

I’m using a 3 node cluster, where I receive lots of timeout exceptions

java.lang.RuntimeException: java.util.concurrent.TimeoutException
        at com.couchbase.client.java.util.Blocking.blockForSingle(Blocking.java:75)
        at com.couchbase.client.java.CouchbaseBucket.get(CouchbaseBucket.java:128)
        at com.couchbase.client.java.CouchbaseBucket.get(CouchbaseBucket.java:123)

Investigating on the cause there are two things I found that my be different from what they should be:

There is one node, whichs disk write queue seems to only grow (currently 623K) but not to decrease
On this node the projector seems to run when I check the processes but noting is listening on port 9999 (“netstat -ntpl | grep 9999” returns no result):

ps aux | grep projector
498 18784 0.0 0.0 474172 2880 ? Sl 2015 10:51 /opt/couchbase/bin/projector -kvaddrs=127.0.0.1:11210 -adminport=:9999 127.0.0.1:8091

We are using Cocuhbase 4.0.0-4051 and Java SDK 2.2.4.

What I found so far, there should be backoff starting when we reach 1M items in the disk write queue. And if I understood right, removing or manual failover of the node should lead to data loss of those items in the disk write queue as they are not replicated/ persisted now.

Does anyone have recommendations on what I could do to not loose the data in the disk write queue? I could think of removing the node (after being sure not to loose any data) because I want to update to 4.1. anyway.
If the problems may be related to the projector, which is not listening - is there a way to (re)start the projector without data loss and removing the node?

Any help would be greaty appreciated.

techilla · February 19, 2016, 9:33am

Found out in the projector logs, that port 9999 was in use when projector was started. Now there is no one listening on that port.

Is it possible to kill the projector process and restart it on the CLI without any data loss or other side-effects brining the node down?

matt.carabine · March 8, 2016, 1:24am

Hey @techilla,

I assume that you have reached a resolution for this issue yourself, but for reference the Couchbase babysitter process should appropriately restart any processes which prematurely terminate, so it should be safe to restart the projector process.

Topic		Replies	Views
Node died on write commit failure, unable to restart/recover node. help! Couchbase Server	2	4824	June 12, 2015
Couchbase 4.1 Community Edition benchmark Couchbase Server	6	3208	January 19, 2016
Data to persist loss when 1 of 3 nodes is restarted Java SDK	5	2051	July 25, 2013
Couchbase queue Disks and Index Services question Couchbase Server	5	1948	October 6, 2016
One node crash will cause several minutes failure of total cluster Couchbase Server	3	2360	October 28, 2013

Disk Write Queue - no items removed?

Related topics