Existing connection forcibly closed by remote host

gchermennov · October 21, 2016, 7:23am

In the company where I work we have problems with Couchbase SDK for .NET, we now use 2.2.1 version with Couchbase Server 4.5.
I don’t believe we have any unusual configuration for Couchbase Server.
Couchbase buckets are replicated to Elasticsearch.
Couchbase is deployed in a cluster, host machines use Windows Server 2012.
We have a .NET application that works with Couchbase (backend services and ASP.NET frontend).
We use .NET 4.5.2 when building and running the project, and on the server it’s .NET 4.0 updated to 4.5.2. My dev machine shows I have .NET 4.0 updated to 4.6.2 (according to dotnetversiondetector app). Dont know if that’s relevant.
This problem usually appears in ASP.NET project.
On a developer machine in debug mode, everything works just fine - we can connect, get/update the documents, etc.
However, when we deploy the app to a test server, there are exception that says “System.Net.Sockets.SocketException: An existing connection was forcibly closed by the remote host”. But the thing is - we make several round trips to the db and exception usually occurs after we make several queries.
The most frequent source of issues is when we attempt to get pretty heavy documents (some are 800-850Kb or more) by their list of ids.
I figured increasing waitTimeout on the client might help - but it didn’t, still the same error.
I thought it may be it’s a problem with Release builds - but then I tried debug binaries on the server and the problem is still there.
We restarted IIS pool, the whole IIS server, and the VM that hosts the app - without any results.
Sometimes there are another error - "Couchbase.IO.RemoteHostTimeoutException: The connection has timed out while an operation was in flight. The default is 15000ms"
After I increased the timeout, the first error appeared.
We didn’t have any changes in network configuration recently.
I can try updating a client to a recent version (ours is almost a year old), but that’s the only option I see left.

I enabled logging in the code that queries Couchbase to display IOperationResult state - on my dev machine logs look like this:
DEBUG assert_success - op.success is True, op.status is Success
DEBUG assert_success - op.success is True, op.status is Success
DEBUG assert_success - op.success is True, op.status is Success
on the server, they look like:
DEBUG assert_success - op.success is True, op.status is Success
DEBUG assert_success - op.success is True, op.status is Success
DEBUG assert_success - op.success is True, op.status is Success
DEBUG assert_success - op.success is True, op.status is Success
DEBUG assert_success - op.success is True, op.status is Success
System.Net.Http.HttpRequestMessageExtensions.DisposeRequestResources DEBUG Disposing
DEBUG assert_success - op.success is False, op.status is ClientFailure

Any ideas?

btburnett3 · October 21, 2016, 1:05pm

@gchermennov

This isn’t really my area of specialty, but I can point you at a couple of things to look at.

Couchbase isn’t really designed to run on Windows for production, just development. In production they strongly recommend Linux.
I’d also look at your firewall settings. Normally the connection pool will sit at a minimum number of connections to the server. As it gets load, it can scale up the number of connections to the maximum size configured. It sounds like it might be trying to open more connections under load, and failing to open those additional connections.
I’ve also seen issues like this if you are not using a singleton to connect to Couchbase. You should only connect one copy of each Bucket, and keep the object in memory for the application lifetime. The ClusterHelper class is designed to help with this.

Brant

gchermennov · October 21, 2016, 2:34pm

So you don’t think this is because we have old client library or there’s a slight mismatch in .NET version between dev and web server machines?

noted. we don’t have a big cluster, 4 machines and recently expanded to 6. and we run without significant problems for more than a year (except when updating between versions) with machines under load almost 24x7 (we ran calculations that involve read/write operations)
you mean Windows firewall on the web server? firewall doesn’t limit the traffic, it can close/open ports for particular programs/protocols
I thought about this too, currently we instantiate Bucket objects as needed - so there may be resource leak. will go to singleton and report back

pvarley · October 21, 2016, 3:46pm

@btburnett3 This is not correct Couchbase Server is supported on Windows. We have many users that have production workloads running successfully on Windows environments.

jmorris · October 21, 2016, 4:09pm

@gchermennov -

Also, I would upgrade the client to the latest stable if you can (2.3.8 at the time of this post). There are a lot of bugs that if been fixed in the dozen or so releases since 2.2.1!

-Jeff

btburnett3 · October 21, 2016, 5:22pm

@pvarley My apologies, that wasn’t the understanding I had previously. Maybe it used to be the case with older versions, but isn’t anymore. I know on my dev machine I used to have problems upgrading between versions and was told that Windows was dev only so I needed to uninstall and reinstall because upgrades weren’t well supported. Thanks for letting me know!

rmendoza · October 28, 2016, 8:45pm

I am also experiencing this issue with Couchbase Server 4.5.0-2601 Enterprise Edition (build-2601) and the latest .NET SDK (2.3.8).

My connections (simple get calls) work for 30-40 minutes just fine, then I begin to see this error:

System.Net.Sockets.SocketException (0x80004005): An existing connection was forcibly closed by the remote host
 at Couchbase.IO.Connection.Send(Byte[] buffer)
 at Couchbase.IO.Services.PooledIOService.Execute[T](IOperation`1 operation, IConnection connection)
 at Couchbase.Authentication.SASL.ScramShaMechanism.Authenticate(IConnection connection, String username,    String password)
 at Couchbase.IO.Services.PooledIOService.Authenticate(IConnection connection)
 at Couchbase.IO.Services.PooledIOService.Execute[T](IOperation`1 operation)

This happens almost always 30-40 minutes after an app pool recycle, and can be resolved by doing another app pool recycle. I am also using the ClusterHelper to manage the bucket connections. In addition, this is only happening with one bucket. Other controllers/bucket interactions remain working during this time.

gchermennov · October 30, 2016, 1:37pm

Ryan, may be can isolate this? What kind of actions do you perform right
before you get this exception?
I get list of pretty heavy documents (800+ Kb each) by their list of ids.

jmorris · October 31, 2016, 10:16pm

@rmendoza -

What do the server logs indicate? Generally this error is caused when (no surprises here) the connection is terminated by the server, but I believe it could also be anything between the server and app machine that closes the connection.

-Jeff

gchermennov · November 1, 2016, 11:01am

Will try to get you the logs.
By the way, how do I know which node’s logs to look into?

jmorris · November 1, 2016, 4:26pm

It would be the node that SDK cannot maintain a connection with. You should be able to deduce it from turning up the verbosity of the logging on the client and then looking through the logs.

-Jeff

gchermennov · November 1, 2016, 6:12pm

Ok, I’ll dig into the client logs. Had this config somewhere

rmendoza · November 7, 2016, 6:59pm

@jmorris I don’t think this is a real connectivity issue. It doesn’t affect any other buckets that have FAR more traffic to them. I was able to resolve this by removing one node from the cluster (random choice) and bringing it back in to force a rebalance.

gchermennov · November 13, 2016, 3:17pm

@rmendoza - that’s one-off fix (e.g. if I have 10 buckets, your solution
will have to be applied to each of them), I’d prefer a permanent one. I’m
back in the office tomorrow, will investigate that further.

emilevr · September 14, 2017, 5:51pm

Hi there.

Did you manage to get to the bottom of this issue? We’re seeing the same issue and it doesn’t look like a real network error either. We do create two cluster objects in our process (to the same cluster) and on each cluster instance open a single bucket (same bucket in Couchbase). We do not use the ClusterHelper class.

Thanks.
Emile

jmorris · September 14, 2017, 6:23pm

@emilevr -

To help, you’ll need to provide more information:

A Wire shark capture taken when the error occurs
Client logs taken when the error occurs

Generally when you see an IO error with the message “Existing connection forcibly closed by remote host” you suspect something between the app server and the cluster (including the cluster). This could be a LB timing out idle connections, server configuration, or a number of other things.

-Jeff

Topic		Replies	Views
An established connection was aborted by the software in your host machine .NET SDK net , connections , dot-net	5	3325	August 14, 2017
ERROR Couchbase.CouchbasePool .NET SDK	2	2667	November 5, 2014
QueryAsync HttpRequestExceptions - Response ended prematurely .NET SDK query , dot-net	38	17120	December 27, 2021
Unable to connect to Bucket Couchbase server with dotnet sdk Couchbase Server dot-net	0	1385	February 7, 2022
Troubleshoot Couchbase .net sdk 3.2 connection issues .NET SDK connections	4	1373	September 15, 2021

Existing connection forcibly closed by remote host

Related topics