Sudden "Node expected to receive data is inactive" exceptions

Hi,

We're using the cluster of 4 nodes (Xeon E3-1245, 32 Gb RAM, 2 x 3 TB SATA HDD) running Couchbase 1.8.1 on top of Centos 6.4. We've got 6 clients accessing the database via couchbase Java client 1.1.7, spymemcached 2.9.0, all computers are in the same 1 Gb network rack.

From time to time we start getting "Node expected to receive data is inactive. This could be due to a failure within the cluster. Will check for updated configuration. Key without a configured node is: ..." errors in logs for random client and access times for this client grow up twice or more. Couchbase cluster is OK as other clients work with no problems and show usual latency level. Restating client helps.

Did anyone experience the same problem?

Thanks,
Kirill

Can you please retry with 1.4.1?

1 Answer

« Back to question.

Hi,

just to verify, can you run this with the 1.1.9 SDK and see if still happens? We had some more changes around that whole area in the past.

If not, to help further I guess debug logs are needed. Just to help get some information for you: the node is marked as inactive if there are lots of failed or timed out informations and the node is getting reconnected.

- Did you see timeouts around/before that?
- Do you have some network infrastructure in between that drops/delays those operations? Check port 11210.

I'm pretty sure this is "something on the wire", but I can't tell from here what it is. If you could get more specific infos, or at least the INFO level logging (DEBUG) would be better.

Thanks Mike, I will try 1.1.9 first and get back to you here if it happens again.

Kirill

Hi,

I'm experiencing similar issue. The log messages are:

2014-04-27 10:53:06,199 WARN [com.couchbase.client.CouchbaseConnection] Node expected to receive data is inactive. This could be due to a failure within the cluster. Will check for updated configuration. Key without a configured node is: Tuj8tzgPoRKqqa-tSIhZW3g.
2014-04-27 10:53:06,205 WARN [net.spy.memcached.protocol.binary.BinaryMemcachedNodeImpl] Operation canceled because authentication or reconnection and authentication has taken more than one second to complete.
2014-04-27 10:53:06,205 WARN [com.couchbase.client.CouchbaseConnection] Node expected to receive data is inactive. This could be due to a failure within the cluster. Will check for updated configuration. Key without a configured node is: TwpYn60lJQK6ON9Svb_618A.
2014-04-27 10:53:06,208 WARN [net.spy.memcached.protocol.binary.BinaryMemcachedNodeImpl] Discarding partially completed op: Cmd: 1 Opaque: 297386805 Key: T7sDSdzhaQCGm82MO4hnOcg Cas: 0 Exp: 259200 Flags: 0 Data Length: 74
2014-04-27 10:53:06,208 WARN [net.spy.memcached.protocol.binary.BinaryMemcachedNodeImpl] Discarding partially completed op: Cmd: 1 Opaque: 297386828 Key: TN9Hto3LbRvuwnIjN6Rf7HQ Cas: 0 Exp: 259200 Flags: 0 Data Length: 74
2014-04-27 10:53:06,208 WARN [com.couchbase.client.CouchbaseConnection] Closing, and reopening {QA sa=sis01.tst.sn.blackarrow.tv/10.250.6.17:11210, #Rops=0, #Wops=0, #iq=3, topRop=null, topWop=null, toWrite=0, interested=4}, attempt 0.

I'm using the Java Client 1.3.1.
The log says "authentication has taken more than one second to complete". Does it mean I need to increase a timeout setting? Which one?

Thanks.