We are getting the following error when using the SDK to insert a document:
“The operation has timed out.”
And the following error when trying to get a document:
“The SDK was disconnected from :11210 before the operation was processed. This may be a temporary error while the SDK re-establishes a connection.”
In both cases, the methods do not block or wait for a period of time before returning this result, it happens almost instantaneously, and it does not appear to be the same behavior for all nodes in the cluster.
The configuration in the app.config is as follows:
<add uri="http://node01:8091" />
<add uri="http://node02:8091" />
<add uri="http://node03:8091" />
<connectionPool name="custom" maxSize="10" minSize="5" />
Is there anything specific we need to do to the cluster/firewall to enable SDK access?
If I only have node3 in the server list, it works. If I comment out node3 and try to use only node2, I get the operation timeout, and the SDK reports the IP address of node3.
Here are a couple of helpful pointers:
- Refer to this document for ports that should be open on the nodes: https://developer.couchbase.com/documentation/server/current/install/install-ports.html
- Note that the nodes must be accessible using the name/IP that’s shown for the node in the Couchbase console. It uses the name in config only for the initial connection to one node (selected randomly), after which it gets configuration from the cluster and subsequently uses that.
In addition to @btburnett3’s suggestions, to help diagnose if there is something blocking portsm you could try our new experimental project that @brett19 has been working on to make it easier to diagnose environmental problems. Its working name at the moment is SDK doctor.. It may not find anything, but it will validate that common connectivity problems aren’t there like some DNS issues.
You’ll find pre-built binaries on the release page.
Usually the summary at the end is pretty easy to interpret. If you need help with interpretation, please feel free to post it or a link to it here!
@btburnett3 : I updated the server entries in the app.config to use the names as they are displayed in the admin console… no change in behavior.
@ingenthr: I ran the sdk doc from my machine and it didn’t produce any errors… the output is pretty straight forward.
Here is the stack trace for IBucket.InsertAsync:
at System.Environment.GetStackTrace(Exception e, Boolean needFileInfo)
at Couchbase.IO.SendTimeoutExpiredException..ctor(String message)
at Couchbase.IO.AsyncState.Complete(Byte response)
at System.Threading.Tasks.Task.ExecutionContextCallback(Object obj)
at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
at System.Threading.Tasks.Task.ExecuteWithThreadLocal(Task& currentTaskSlot)
at System.Threading.Tasks.Task.ExecuteEntry(Boolean bPreventDoubleExecution)
And for IBucket.GetAsync:
Anything else I should be looking for? Is there any detailed logging that I can enabled in the SDK?
Have you had any success getting over the problem you described above?
If not, please can you try using the IP addresses instead of DNS names in the server list? It may be something to do with DNS resolution and or port access. As Brant indicated above, the list of servers in the config is only used to make an initial connection and once a connection is made, the cluster returns a complete list of hostnames for the client to use.
Also, are you using some virtualisation to create your cluster nodes, eg Docker? I’ve hit snags in the past getting ports working properly between nodes.
No progress with this issue. It is happening in multiple of our environments currently.
I just tried to use the IP addresses for each node, and still the same behavior. Only node1 seems to work. A packet capture reveals activity between the machine where the SDK is running and node 1 and 2. No activity from node 3.
Can you enable logging and post the info? Just include a single op, so that the file isn’t too big. We most definitely want to capture bootstrapping.
This has been identified as a bug in the SDK by CB support (https://issues.couchbase.com/browse/NCBC-1383) and I was told that it is planned to be fixed in release 2.4.5.
In the meantime, the suggestion was to downgrade the NuGet package to 2.3.11, which fixes this problem for now.
Thank you all for your responses!