Sync_gateway: ipv6 open connections keep climbing

I have sync_gateway configured behind an nginx reverse proxy that terminates SSL.

On sync_gateway we are finding that the number of open TCP connections using IPv6 keeps climbing. Found via…

lsof -p 10 | grep -i ‘established’ | grep -i IPv6 | wc -l

Has anyone experienced this? Should it be accepting IPv6 connections at all?

You can expect one long-lasting TCP connection per active client, used for the “changes feed” that pushes changes to the client. This may be using either HTTP or WebSockets, depending on the client implementation. These connections close when the clients quit or go offline.

IPv6 vs IPv4 is irrelevant to this. I’d expect to be seeing an increasing number of IPv6 connections since I’m told that mobile carriers are aggressively switching their networks over to IPv6 to address address shortages.

What version of Sync Gateway are you running? There was an enhancement in 1.2.0 to more aggressively release half-closed connections (where clients with the long-lasting connection Jens describes disconnect) - if you’re not on at least 1.2.0, that may be related.

I am using Sync Gateway 1.2.1

I restarted Sync Gateway about 24hrs ago, there is a constantly climbing number of open TCP connections. An initial analysis suggests that the number of IPv4 connections is roughly constant, but IPv6 is linearly increasing.

We have roughly 50-100 active users at any given time. Initially on starting the server we saw about 200 open connections in the first 30 mins. We’re now at 1200, ~90% of these are IPv6.

I have been restarting sync_gateway every few days to reset the connections. I looked into this again today, strangely the output of lsof has now changed… I am seeing lines like this…

sync_gate 24829 root 924u sock 0,7 0t0 15615271 can’t identify protocol

I found this issue in another product that sounds somewhat related as they had this error with their server, it is related to websockets and the issue happened when nginx was in front of the server…

Any suggestions on how I can investigate this further?

Have you already reviewed the settings described here?
http://developer.couchbase.com/documentation/mobile/current/develop/guides/sync-gateway/nginx/configuring-nginx-for-sync-gateway/index.html

Yes I have, I am all setup with exactly those settings.

Our traffic slowed a little in the last week which meant we didn’t hit the connection limit before I saw a peak in connections hit. The closest round number to the time where it plateaued is 400,000 seconds. So, now need to see if I can figure out which timeout this is.

Hello, I work with combinatorial. We have solved this problem by disabling proxy buffering in our NGINX proxy. It appears that there is some complicated interaction between the proxy and sync gateway where buffered responses are forcing the TCP connections to remain in the ESTABLISHED state. We added the following line to our NGINX configuration

proxy_buffering off;

Our number of open TCP connections have changed from constantly climbing to around 3000 and plateauing, to averaging around 250 open connections and fluctuating up and down as expected.

The restart of sync gateway appears to have been caused by high memory usage on the server it was running on. There was a high correlation between our memory usage graph and open TCP connections graph.

I could not tell from our graphs whether the TCP connections that remained open were from the client to the proxy or from the proxy to sync gateway but given that sync gateways memory usage was constantly increasing it is likely that it was the connections from nginx to sync gateway that were remaining open. Is it possible that the heartbeat was somehow being buffered by NGINX and it was this that was forcing the TCP connection to remain open?

@nkhumphreys Thanks very much for the update - that’s useful information. My first thought was similar to yours - that Sync Gateway’s heartbeats were being buffered and keeping the connection open. The other possibility is that the nginx buffering prevents the CloseNotifier being triggered, so Sync Gateway isn’t able to identify the connection as a half-closed connection.

I’ll have a note about setting this added to the Sync Gateway/nginx documentation page, and will see if we can repro this in our internal test infrastructure.