Is there a way to identify that sync_gateway is not running?

I was trying to find a way to identify when sync_gateway stops running suddenly due to following error

So that I can restart my sync_gateway automatically or trigger a notification to maintenance person to restart it manually

In that case, Sync Gateway hasn’t stopped, it is just not able to open new server side sockets. I think a better fix is to figure out why you are running out of server side sockets (is your ulimit set too low? do you have clients that aren’t properly closing connections), and then make sure that you have 2-5X server side sockets available than you think you will need at the high watermark.

Hi @traun we haven’t modified the server side sockets value. It is set to default which is 5000 is guess. Problem is there are around 50 clients which goes online & offline most frequently (chances of clients number going high in near future) and the continuous sync is enabled on all the clients (iPads). If there is any best practices we can follow in this situation, please let me know.

@ajaykoppisetty can you say more about the use case?

How high do you expect the numbers to go? Do the clients know they’re going offline (i.e. could they shutdown replication before going offline)?

One very crude approach would be to have something monitor the log output and restart SG when you see the notification. (You can just kill it and start it over. Not ideal, but it should work.)

Another (as Traun mentioned) would be to boost the max socket count.

If you’re not worried about poor connectivity you could also look at changing the heartbeat and timeouts on SG so the sockets get shut down more quickly. This would be if the clients have good network connectivity whenever they connect. Otherwise you might run into trouble with SG deciding connections are dead too often.

Hod

Hi @hod.greeley,

Thank you for the suggestions but we found the root cause of the issue 

from the support that the ulimit on the server is very low. We have raised the ulimit on the server and from then the sync_gateway stops but for being on safe side we will go with the crude approach for now and it will be really good to have the alert on sync down from couchbase :slight_smile:

and regarding the heart_beat, we really haven’t experimented with it. When the clients connect it will be really good connection as it is only setup for that set of devices and these device may go upto 100 but it is hard to identify when it really goes offline. (We are stopping the replication when we get the offline notification from device)