I have lots of processes that get spawned to do work, and they all create a couchbase client with a minPoolSize of 6, the cluster of 4 nodes has about 1,000 total connections to it. I’m using the .net 1.3.11 client.
Recently I have been getting more and more Failed to initialize the pool errors. The call to CouchbaseClient will actually return after ~25 seconds without throwing an exception, but then future calls to try and read from the CouchbaseClient object will return unable to locate node errors, looking at the couchbase logging it shows the following:
Error Failed to initialize the pool. System.Net.Sockets.SocketException (0x80004005): A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond 10.200.200.101:11210
at System.Net.Sockets.Socket.DoConnect(EndPoint endPointSnapshot, SocketAddress socketAddress)
at System.Net.Sockets.Socket.Connect(EndPoint remoteEP)
at Couchbase.SocketPool.Create()
at Couchbase.SocketPool.PreAllocate(Int32 capacity)
at Couchbase.SocketPool…ctor(IMemcachedNode node, ISocketPoolConfiguration config, ISaslAuthenticationProvider provider)
at Couchbase.CouchbaseNode…ctor(IPEndPoint endpoint, Uri couchApiBase, ICouchbaseClientConfiguration config, ISaslAuthenticationProvider provider)
at Couchbase.CouchbasePool.CreateNode(IPEndPoint endpoint, ISaslAuthenticationProvider auth, Dictionary`2 nodeInfo)
at Couchbase.CouchbasePool.InitVBucket(ClusterConfig config, ISaslAuthenticationProvider auth)
at Couchbase.CouchbasePool.ReconfigurePool(ClusterConfig config)
Currently the process dies and retries due to these errors, but the 30+ second delay and filling up error logs is not good. Is there anyway to further troubleshoot what could be causing these connection failures? Clearly the cluster is always up as the next try will succeed and other processes that maintain a connection don’t get any errors.
I’m not sure if its related, but I am also getting a lot of errors on calls to get that I am retrying on that look like:
System.ApplicationException: Failed to retrieve key: Exception reading response - 10.200.200.102:11210 —> System.IO.IOException: Unable to read data from the transport connection: A blocking operation was interrupted by a call to WSACancelBlockingCall. —> System.Net.Sockets.SocketException: A blocking operation was interrupted by a call to WSACancelBlockingCall
at System.Net.Sockets.NetworkStream.Read(Byte[] buffer, Int32 offset, Int32 size)
— End of inner exception stack trace —
at System.Net.Sockets.NetworkStream.Read(Byte[] buffer, Int32 offset, Int32 size)
at Couchbase.CouchbasePooledSocket.Read(Byte[] buffer, Int32 offset, Int32 count)
at Enyim.Caching.Memcached.Protocol.Binary.BinaryResponse.Read(IPooledSocket socket)
at Enyim.Caching.Memcached.Protocol.Binary.MultiGetOperation.ReadResponse(IPooledSocket socket)
at Couchbase.CouchbaseNode.Execute(IOperation op)