We are using couchnode for a couple of months now and we really like it, but we have experienced an issue a couple of days ago while adding a new node to an existing couchbase cluster and we would like to understand better the way couchnode handles unavailability from one or some nodes in a couchbase cluster.
The issue we had happened when we added a new node to couchbase but forgot to allow DNS resolution for this new node from the application server using couchnode.
As soon as the rebalancing started, some keys where moved to the new couchbase node, but the application servers were not able to get/set them because they could not resolve the IP of the new couchbase node ( we identify couchbase nodes via a DNS name rather than IP address)
We, of course, understand that there is no magic and that couchnode/libCouchbase cannot resolve a hostname that has not been declared, but the unfortunate thing is that gets and sets didn’t complain about any thing :
The “Connection” class didn’t emit any error
The couchbase .get .set, etc … didn’t call back any error
We know that operations in couchnode are queued until the connection is ready, but what does this really mean :
- Does “connection is ready” means that couchnode can reach http://first_available_host:8091 ?
- Does it check for real availability of each cluster node on each bucket port ?
- Does “connectionTimeout” and/or “operationTimeout” options passed to “new Connection” includes DNS resolution ?
- Is there any limit for the number of operations couchnode can queue before throwing an error ? (Can we change this queue depth | What happens when the max depth is reached ?)
- Is there any way to watch this queue depth ?
- If auto-failover is set to false in couchbase, what happens when a couchbase node goes done ?
-> Are we supposed to receive some errors or will the operations on this node be queued and until when/what ?
We are using couchnode 1.2.0 ans couchbase 2.1.1CE
Last question : does the 2.5 couchbase version changes anything about this “connection is ready” behaviour ?
I am referring to the “Optimized connection management” introduced in 2.5 that now connects to 11211 instead of 8091
Thank you for any hints you could provide