Couchbase .NET Client Gateway Timeouts with bad recovery
We're evaluating Couchbase 1.8 with the .NET client as well as the .NET session provider (in a .NET 4.0 MVC3 web application). We have noticed that the .NET client opens and holds open a connection to the CB servers on 8091 with the intent of reading messages from that server (see the MessageStreamListener class). Our network gateway after a period of time (30 min) identifies this as a inactive/timed out connection, and force closes it. The logs of the event look like the following (sanitized for security):
[2012-07-10 00:15:11.366 CST] [DEBUG] HB: Trying 'http://[test_server_1]:8091/pools' [Couchbase.MessageStreamListener]
[2012-07-10 00:15:11.366 CST] [DEBUG] HB: Node 'http://[test_server_1]:8091/pools' is OK [Couchbase.MessageStreamListener]
[2012-07-10 00:15:13.929 CST] [DEBUG] ReadMessage failed with exception: [Couchbase.MessageStreamListener]
System.IO.IOException: Unable to read data from the transport connection: An existing connection was forcibly closed by the remote host. ---> System.Net.Sockets.SocketException: An existing connection was forcibly closed by the remote host
at System.Net.Sockets.NetworkStream.Read(Byte[] buffer, Int32 offset, Int32 size)
--- End of inner exception stack trace ---
at System.Net.ConnectStream.Read(Byte[] buffer, Int32 offset, Int32 size)
at System.IO.StreamReader.ReadBuffer()
at System.IO.StreamReader.ReadLine()
at Couchbase.MessageStreamListener.ReadMessages(Uri heartBeatUrl, Uri configUrl) in c:\dev\Build\couchbase-net-client\src\Couchbase\MessageStreamListener.cs:line 400
at Couchbase.MessageStreamListener.ProcessPool() in c:\dev\Build\couchbase-net-client\src\Couchbase\MessageStreamListener.cs:line 245
[2012-07-10 00:15:13.929 CST] [DEBUG] Counter is 0, sleeping for 00:00:02 then retrying. [Couchbase.MessageStreamListener]
[2012-07-10 00:15:15.930 CST] [DEBUG] Last message was the same as current, ignoring. [Couchbase.BucketConfigListener]
[2012-07-10 00:15:15.945 CST] [DEBUG] Last message was the same as current, ignoring. [Couchbase.BucketConfigListener]
[2012-07-10 00:15:15.945 CST] [DEBUG] Last message was the same as current, ignoring. [Couchbase.BucketConfigListener]
[2012-07-10 00:15:15.945 CST] [DEBUG] Processing message: {"name":"[test_bucket]","bucketType":"membase","authType":"sasl","saslPassword":"","proxyPort":0,"uri":"/pools/default/buckets/[test_bucket]","streamingUri":"/pools/default/bucketsStreaming/[test_bucket]","flushCacheUri":"/pools/default/buckets/[test_bucket]/controller/doFlush","nodes":[{"replication":0.0,"clusterMembership":"active","status":"healthy","hostname":"[test_hostname]:8091","clusterCompatibility":1,"version":"1.8.0r-55-g80f24f2-enterprise","os":"windows","ports":{"proxy":11211,"direct":11210}}],"stats":{"uri":"/pools/default/buckets/[test_bucket]/stats","directoryURI":"/pools/default/buckets/[test_bucket]/statsDirectory","nodeStatsListURI":"/pools/default/buckets/[test_bucket]/nodes"},"nodeLocator":"vbucket","vBucketServerMap":{"hashAlgorithm":"CRC","numReplicas":1,"serverList":["[test_hostname]:11210"],"vBucketMap":[[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1]]}} [Couchbase.MessageStreamListener]
[2012-07-10 00:15:25.949 CST] [DEBUG] HB: Pinging current node 'http://[test_server_1]:8091/pools' to check if it's still alive. [Couchbase.MessageStreamListener]
[2012-07-10 00:15:25.949 CST] [DEBUG] HB: Trying 'http://[test_server_1]:8091/pools' [Couchbase.MessageStreamListener]
[2012-07-10 00:15:25.949 CST] [DEBUG] HB: Node 'http://[test_server_1]:8091/pools' is OK [Couchbase.MessageStreamListener]
This is a single web server, single couchbase server configuration running 1.8, but we've verified it in a multi-couchbase server instance with 4 nodes. Is there any configuration that could instrument the provider to close/recover more gracefully on its own? Is there any way to make it keep the heartbeat open so it does not look idle? I've looked through the Couchbase project to verify what the code is doing and it's pretty clearly designed to open and hold open a connection.
The problem here is we're using the session provider and requests that occur during this narrow recovery window appear to the provider to have no session data, and the user is logged out. Please advise if there is anything that can be done at the Couchbase provider level.
Thanks for the response. We are using the latest client and heartbeat is enabled by default at 10 sec intervals. I can see from the Couchbase client logs that we are hitting the server at 10 second intervals:
[2012-07-10 09:13:23.544 CST] [DEBUG] HB: Pinging current node 'http://[test_server]:8091/pools' to check if it's still alive. [Couchbase.MessageStreamListener]
[2012-07-10 09:13:23.544 CST] [DEBUG] HB: Trying 'http://[test_server]:8091/pools' [Couchbase.MessageStreamListener]
[2012-07-10 09:13:23.544 CST] [DEBUG] HB: Node 'http://[test_server]:8091/pools' is OK [Couchbase.MessageStreamListener]
[2012-07-10 09:13:33.545 CST] [DEBUG] HB: Pinging current node 'http://[test_server]:8091/pools' to check if it's still alive. [Couchbase.MessageStreamListener]
[2012-07-10 09:13:33.545 CST] [DEBUG] HB: Trying 'http://[test_server]:8091/pools' [Couchbase.MessageStreamListener]
[2012-07-10 09:13:33.545 CST] [DEBUG] HB: Node 'http://[test_server]:8091/pools' is OK [Couchbase.MessageStreamListener]
I can add the heartbeat config (might not be bad to have it in place for posterity should we want to tweak it later) but I think this is working.
Specifically looking at the implementation of the Couchbase binary, we have 2 things going on:
The MessageStreamListener.ReadMessages which creates a web request to the pools endpoint and holds it open reading messages, and a HeartBeat object which has a worker that creates additional heartbeat requests at configurable intervals and logs them. From our network gateway perspective these are 2 separate activities with different sessions, so the heartbeat does not successfully keep the gateway from closing the first held-open connection in ReadMessages. ReadMessages also seems to only pull the initial VBucket list when first loaded, with no additional messages (messages are prefixed in the logs by a "Processing message:" text block). This lack of data crossing the wire for that session may be what's causing our gateway to close it. If there was even a periodic "no change" response that might help. Alternately, if we could use something bidirectional/not http where we could actively ask the CB server if there are any messages that might also help tell our gateway to not interfere, and would arguably supplant the need for the separate HeartBeat calls.
I know this seems like more of a network topology/rules/setup issue than a Couchbase issue, but I was hoping there might be some setting/configuration that I wasn't aware of that could be made to accommodate this.
We have verified that the 10 second heartbeat to the first server listed in the configuration is working without the section being specified. However, in our case, it appears that our firewall between the CB clients and the CB nodes is forcably closing connections on port 11210 (and not 8091 which I believe you are seeing). Here is what we are seeing in our debug logs.
On first connection we see a proper connection pool created:
====
2012-07-12 22:24:53,660 [1] DEBUG Enyim.Caching.Memcached.MemcachedNode.InternalPoolImpl [(null)] - Pool has been inited for 10.1.1.59:11210 with 10 sockets
2012-07-12 22:24:53,676 [1] DEBUG Enyim.Caching.Memcached.MemcachedNode.InternalPoolImpl [(null)] - Acquiring stream from pool. 10.1.1.59:11210
2012-07-12 22:24:53,676 [1] DEBUG Enyim.Caching.Memcached.PooledSocket [(null)] - Socket a5c79726-d9df-494d-8928-b4a87875d8a7 was reset
2012-07-12 22:24:53,676 [1] DEBUG Enyim.Caching.Memcached.MemcachedNode.InternalPoolImpl [(null)] - Socket was reset. a5c79726-d9df-494d-8928-b4a87875d8a7
2012-07-12 22:24:53,692 [1] DEBUG Enyim.Caching.Memcached.MemcachedNode.InternalPoolImpl [(null)] - Releasing socket a5c79726-d9df-494d-8928-b4a87875d8a7
2012-07-12 22:24:53,692 [1] DEBUG Enyim.Caching.Memcached.MemcachedNode.InternalPoolImpl [(null)] - Are we alive? True
====
Next when retrieving a key it properly finds a connection and retrieves the data:
====
2012-07-12 22:24:54,971 [1] DEBUG Enyim.Caching.Memcached.MemcachedNode.InternalPoolImpl [(null)] - Acquiring stream from pool. 10.1.1.59:11210
2012-07-12 22:24:54,971 [1] DEBUG Enyim.Caching.Memcached.PooledSocket [(null)] - Socket a5c79726-d9df-494d-8928-b4a87875d8a7 was reset
2012-07-12 22:24:54,971 [1] DEBUG Enyim.Caching.Memcached.MemcachedNode.InternalPoolImpl [(null)] - Socket was reset. a5c79726-d9df-494d-8928-b4a87875d8a7
2012-07-12 22:24:54,986 [1] DEBUG Enyim.Caching.Memcached.MemcachedNode.InternalPoolImpl [(null)] - Releasing socket a5c79726-d9df-494d-8928-b4a87875d8a7
2012-07-12 22:24:54,986 [1] DEBUG Enyim.Caching.Memcached.MemcachedNode.InternalPoolImpl [(null)] - Are we alive? True
====
However, if we let our application idle for an hour and try to retrieve the key again, we get the following error:
====
2012-07-13 01:09:07,383 [1] DEBUG Enyim.Caching.Memcached.MemcachedNode.InternalPoolImpl [(null)] - Acquiring stream from pool. 10.1.1.59:11210
2012-07-13 01:09:07,383 [1] DEBUG Enyim.Caching.Memcached.PooledSocket [(null)] - Socket a5c79726-d9df-494d-8928-b4a87875d8a7 was reset
2012-07-13 01:09:07,398 [1] DEBUG Enyim.Caching.Memcached.MemcachedNode.InternalPoolImpl [(null)] - Socket was reset. a5c79726-d9df-494d-8928-b4a87875d8a7
2012-07-13 01:09:07,398 [1] ERROR Enyim.Caching.Memcached.MemcachedNode [(null)] - System.IO.IOException: Failed to write to the socket '10.1.1.59:11210'. Error: ConnectionReset
at Enyim.Caching.Memcached.ThrowHelper.ThrowSocketWriteError(IPEndPoint endpoint, SocketError error)
at Enyim.Caching.Memcached.PooledSocket.Write(IList`1 buffers)
at Enyim.Caching.Memcached.MemcachedNode.ExecuteOperation(IOperation op)
2012-07-13 01:09:07,445 [1] DEBUG Enyim.Caching.Memcached.MemcachedNode.InternalPoolImpl [(null)] - Releasing socket a5c79726-d9df-494d-8928-b4a87875d8a7
2012-07-13 01:09:07,445 [1] DEBUG Enyim.Caching.Memcached.MemcachedNode.InternalPoolImpl [(null)] - Are we alive? True
2012-07-13 01:09:07,461 [1] DEBUG Enyim.Caching.Memcached.MemcachedNode.InternalPoolImpl [(null)] - Mark as dead was requested for 10.1.1.59:11210
2012-07-13 01:09:07,476 [1] DEBUG Enyim.Caching.Memcached.MemcachedNode.InternalPoolImpl [(null)] - FailurePolicy.ShouldFail(): True
2012-07-13 01:09:07,476 [1] WARN Enyim.Caching.Memcached.MemcachedNode.InternalPoolImpl [(null)] - Marking node 10.1.1.59:11210 as dead
2012-07-13 01:09:07,492 [1] DEBUG Couchbase.CouchbasePool [(null)] - Node 10.1.1.59:11210 is dead.
2012-07-13 01:09:07,492 [1] DEBUG Couchbase.CouchbasePool [(null)] - We have a standard config, so we'll recreate the node locator.
2012-07-13 01:09:07,507 [1] DEBUG Couchbase.CouchbasePool [(null)] - Initializing the locator with the list of working nodes.
2012-07-13 01:09:07,507 [1] DEBUG Couchbase.CouchbasePool [(null)] - Replaced the internal state.
2012-07-13 01:09:07,523 [1] DEBUG Couchbase.CouchbasePool [(null)] - Starting the recovery timer.
2012-07-13 01:09:07,523 [1] DEBUG Couchbase.CouchbasePool [(null)] - Timer started.
2012-07-13 01:09:07,539 [1] DEBUG Couchbase.CouchbasePool [(null)] - Fail handler is finished.
2012-07-13 01:09:17,538 [12] DEBUG Couchbase.CouchbasePool [(null)] - Checking the dead servers.
2012-07-13 01:09:17,538 [12] DEBUG Couchbase.CouchbasePool [(null)] - Alive: 10.1.1.57:11210
2012-07-13 01:09:17,554 [12] DEBUG Couchbase.CouchbasePool [(null)] - Alive: 10.1.1.58:11210
2012-07-13 01:09:17,554 [12] DEBUG Couchbase.CouchbasePool [(null)] - Dead: 10.1.1.59:11210
2012-07-13 01:09:17,570 [12] DEBUG Couchbase.CouchbasePool [(null)] - Ping ok.
2012-07-13 01:09:17,570 [12] DEBUG Couchbase.CouchbasePool [(null)] - We have a standard config, so we'll recreate the node locator.
2012-07-13 01:09:17,585 [12] DEBUG Couchbase.CouchbasePool [(null)] - Initializing the locator with the list of working nodes.
2012-07-13 01:09:17,601 [12] DEBUG Couchbase.CouchbasePool [(null)] - Replaced the internal state.
2012-07-13 01:09:17,601 [12] DEBUG Couchbase.CouchbasePool [(null)] - deadCount == 0, stopping the timer.
====
So my real question is what is the point of the heartbeat if it only goes to the first server in the configuration listing and it only goes over port 8091 (and not 11210)?
In any case, I understand this is an issue caused by our network topology, but it is highly common for firewalls to close connections that have been idle for many minutes or hours which keeps your network load under control. The fact that the Couchbase .NET client does not retry the connection to the node seems silly in my opinion.
Are we really the only two Couchbase community members that have seen this issue? Sorry for the sour post but we've thought this issue was our software for the past month and we come to find out it's due to non-graceful connection handling by the Couchbase .NET client is frustrating.
I also have this problem. Bad Gateway is shown when we try to use views in beer sample. Sometimes it shows data, and sometimes it's bad gateway error. Code:
var view = client.GetView("beer", "by_name").StartKey("A").EndKey("B").Limit(50);
foreach (var row in view) {
var item = client.GetJson<Model.Beer>(row.ItemId);
this.Sample += item.Name + "<br/>";
}
I'm not sure if we are having the same issue (see my post http://www.couchbase.com/forums/thread/intermittent-node-failure-events-...), but after reading your post you got me thinking that we might be running into the sames issues. In our case we have a firewall between our clients and node cluster.
The Couchbase manual does warn that some firewalls and proxy software can forcible close idle connections: http://www.couchbase.com/docs/couchbase-manual-1.8/couchbase-bestpractic...
In doing some research, it looks like in version 1.1.5 (current version is 1.1.6) of the .NET client implemented a heartbeat configuration section: https://www.couchbase.com/docs/couchbase-sdk-net-1.1/couchbase-sdk-net-r...
We are doing some more testing in the morning and will report what we find, but maybe this will help you out. We are not sure if the heartbeat is turned on by default or what, so we will figure that out in the morning.