Sync_gateway: ipv6 open connections keep climbing

combinatorial · July 14, 2016, 8:47pm

I have sync_gateway configured behind an nginx reverse proxy that terminates SSL.

On sync_gateway we are finding that the number of open TCP connections using IPv6 keeps climbing. Found via…

lsof -p 10 | grep -i ‘established’ | grep -i IPv6 | wc -l

Has anyone experienced this? Should it be accepting IPv6 connections at all?

jens · July 14, 2016, 10:05pm

You can expect one long-lasting TCP connection per active client, used for the “changes feed” that pushes changes to the client. This may be using either HTTP or WebSockets, depending on the client implementation. These connections close when the clients quit or go offline.

IPv6 vs IPv4 is irrelevant to this. I’d expect to be seeing an increasing number of IPv6 connections since I’m told that mobile carriers are aggressively switching their networks over to IPv6 to address address shortages.

adamf · July 14, 2016, 10:06pm

What version of Sync Gateway are you running? There was an enhancement in 1.2.0 to more aggressively release half-closed connections (where clients with the long-lasting connection Jens describes disconnect) - if you’re not on at least 1.2.0, that may be related.

combinatorial · July 14, 2016, 10:13pm

I am using Sync Gateway 1.2.1

I restarted Sync Gateway about 24hrs ago, there is a constantly climbing number of open TCP connections. An initial analysis suggests that the number of IPv4 connections is roughly constant, but IPv6 is linearly increasing.

We have roughly 50-100 active users at any given time. Initially on starting the server we saw about 200 open connections in the first 30 mins. We’re now at 1200, ~90% of these are IPv6.

combinatorial · July 23, 2016, 9:41pm

I have been restarting sync_gateway every few days to reset the connections. I looked into this again today, strangely the output of lsof has now changed… I am seeing lines like this…

sync_gate 24829 root 924u sock 0,7 0t0 15615271 can’t identify protocol

I found this issue in another product that sounds somewhat related as they had this error with their server, it is related to websockets and the issue happened when nginx was in front of the server…

github.com/cotag/spider-gazelle

Orphan sockets file descriptors; "can't identify protocol" issue.

opened 11:33PM - 19 Jun 14 UTC

closed 02:03PM - 22 Jun 14 UTC

dariocravero

bug

Spider Gazelle is somehow generating many orphan sockets fd. The connections see…m to be half closing or something along the lines of that. `lsof` output `ruby 12107 panels-app 937u sock 0,7 0t0 220913 can't identify protocol` This eventually leads to the process running out of file descriptors and hanging up. I have a feeling it might be related to [this](https://github.com/cotag/spider-gazelle/blob/4146b49985830e7cf627bb0168100de3666874b6/lib/spider-gazelle/gazelle.rb#L157-L160). The problem might be WebSockets related - SockJS hinted that node's faye had some issues. I'm about to do a very simple app and build it up to see when it happens. [Ref.](https://github.com/cotag/spider-gazelle/commit/eeab1dff47d9da71f3cfb52b20008e1565b9401a#commitcomment-6733184)

Any suggestions on how I can investigate this further?

adamf · July 25, 2016, 4:02pm

Have you already reviewed the settings described here?
http://developer.couchbase.com/documentation/mobile/current/develop/guides/sync-gateway/nginx/configuring-nginx-for-sync-gateway/index.html

combinatorial · July 25, 2016, 4:13pm

Yes I have, I am all setup with exactly those settings.

combinatorial · August 5, 2016, 2:54am

Our traffic slowed a little in the last week which meant we didn’t hit the connection limit before I saw a peak in connections hit. The closest round number to the time where it plateaued is 400,000 seconds. So, now need to see if I can figure out which timeout this is.

nkhumphreys · July 21, 2017, 1:05pm

Hello, I work with combinatorial. We have solved this problem by disabling proxy buffering in our NGINX proxy. It appears that there is some complicated interaction between the proxy and sync gateway where buffered responses are forcing the TCP connections to remain in the ESTABLISHED state. We added the following line to our NGINX configuration

proxy_buffering off;

Our number of open TCP connections have changed from constantly climbing to around 3000 and plateauing, to averaging around 250 open connections and fluctuating up and down as expected.

The restart of sync gateway appears to have been caused by high memory usage on the server it was running on. There was a high correlation between our memory usage graph and open TCP connections graph.

I could not tell from our graphs whether the TCP connections that remained open were from the client to the proxy or from the proxy to sync gateway but given that sync gateways memory usage was constantly increasing it is likely that it was the connections from nginx to sync gateway that were remaining open. Is it possible that the heartbeat was somehow being buffered by NGINX and it was this that was forcing the TCP connection to remain open?

adamf · July 21, 2017, 4:13pm

@nkhumphreys Thanks very much for the update - that’s useful information. My first thought was similar to yours - that Sync Gateway’s heartbeats were being buffered and keeping the connection open. The other possibility is that the nginx buffering prevents the CloseNotifier being triggered, so Sync Gateway isn’t able to identify the connection as a half-closed connection.

I’ll have a note about setting this added to the Sync Gateway/nginx documentation page, and will see if we can repro this in our internal test infrastructure.

Topic		Replies	Views
Connection management of sync gateway Sync Gateway	1	1647	May 17, 2016
Android 1.0.3 Sync gateway opens new persistent connection when wi-fi turned off and on Sync Gateway	8	3809	March 20, 2015
How does sync gateway define 'concurrent users'? Sync Gateway	0	2303	March 2, 2015
Is there a way to identify that sync_gateway is not running? Sync Gateway connections	4	1430	April 25, 2017
Sync Gateway Performance Issues Sync Gateway	0	2599	February 27, 2015

Sync_gateway: ipv6 open connections keep climbing

Related topics