Sudden "Node expected to receive data is inactive" exceptions
We're using the cluster of 4 nodes (Xeon E3-1245, 32 Gb RAM, 2 x 3 TB SATA HDD) running Couchbase 1.8.1 on top of Centos 6.4. We've got 6 clients accessing the database via couchbase Java client 1.1.7, spymemcached 2.9.0, all computers are in the same 1 Gb network rack.
From time to time we start getting "Node expected to receive data is inactive. This could be due to a failure within the cluster. Will check for updated configuration. Key without a configured node is: ..." errors in logs for random client and access times for this client grow up twice or more. Couchbase cluster is OK as other clients work with no problems and show usual latency level. Restating client helps.
Did anyone experience the same problem?
just to verify, can you run this with the 1.1.9 SDK and see if still happens? We had some more changes around that whole area in the past.
If not, to help further I guess debug logs are needed. Just to help get some information for you: the node is marked as inactive if there are lots of failed or timed out informations and the node is getting reconnected.
- Did you see timeouts around/before that?
- Do you have some network infrastructure in between that drops/delays those operations? Check port 11210.
I'm pretty sure this is "something on the wire", but I can't tell from here what it is. If you could get more specific infos, or at least the INFO level logging (DEBUG) would be better.