Different number connection on my cluster

Hi,
I have 8 server nodes and one of them has a problem (couchbase05, it crashes continuously)
Here below are the connection details:

8091    	8092	    8093	9100	  9102	      9999	      11209	        11210	  port/host
95	        323	        5868	16	        31	        28	        322	        1598	couchbase01
129	        307	        5872	16	        32	        23	        322	        1600	couchbase02
98	        303	        5879	16	        32	        29	        322	        1620	couchbase03
144	        306	        5882	16	        25	        21	        322	        1592	couchbase04
89	        236	         238     8          31	        15	        322	        2723	couchbase05
105	        317	        5612	16	        34	        21	        322	        1609	couchbase06
111	        314	        5895	16	        28	        20	        322	        1533	couchbase07
99	        361	        5725	16	        33	        21	        324	        1582	couchbase08

Is it normal that couchbase05 a doubled number of connections on port 11210 and a halved number of connection on port 9100 with respect to the other nodes?
Thanks

further studies brought up the problem: it seems that the node (couchbase05) does not contact other nodes on port 9100…
why ?

and in my log i found this :

Service 'goxdcr' exited with status 1. Restarting. Messages: MetadataService 2017-02-23T12:41:46.072+01:00 [ERROR] metakv.ListAllChildren failed. path=/remoteCluster/, err=Get http://127.0.0.1:8091/_metakv/remoteCluster/: CBAuth database is stale. Was never updated yet., num_of_retry=2
MetadataService 2017-02-23T12:41:46.072+01:00 [ERROR] metakv.ListAllChildren failed. path=/remoteCluster/, err=Get http://127.0.0.1:8091/_metakv/remoteCluster/: CBAuth database is stale. Was never updated yet., num_of_retry=3
MetadataService 2017-02-23T12:41:46.072+01:00 [ERROR] metakv.ListAllChildren failed. path=/remoteCluster/, err=Get http://127.0.0.1:8091/_metakv/remoteCluster/: CBAuth database is stale. Was never updated yet., num_of_retry=4
RemoteClusterService 2017-02-23T12:41:46.072+01:00 [ERROR] Failed to get all entries, err=metakv failed for max number of retries = 5
[goport] 2017/02/23 12:41:46 /opt/couchbase/bin/goxdcr terminated: exit status 1

this problems may be related?

Hi Alessandro -
What version are you running? There is at least one known issue that looks like what you’re experiencing that was fixed in 4.5.0: MB-16568 .

You should also be able to fail over that node and re-add it and it should recover.

Just as an aside, you seem to have a lot of connections open on port 8093 - are you sure your applications are not leaking connections? Usually, the number of connections on 11210 per node would be about the same as the number of client objects, and that would also be about the same as the number of connections you have on port 8093.

Good luck,
-Will

Thank you for the reply
We are currently using version: 4.0.0-4051
We are considering upgrading to 4.5, what would be the best procedure for a cluster of 8 machines?
About to active connections, we are updating our service to have fewer active connections, in order to lighten the load on the cluster.
Is the failover operation heavy in terms of resources and time?
thank you

On how to upgrade, I recommend taking a look at the information here: https://developer.couchbase.com/documentation/server/current/install/upgrade.html and also look at the page called “Upgrade Options”. We recently re-wrote that to be easier to understand, so I hope it helps you decide which one is right for you.
Failover should not be a heavy operation, but you need to rebalance to bring back in the nodes, and rebalance is a heavier operation. Depending on your production use case, an offline upgrade can be faster. Some of our big customers would rather take the downtime than perform rebalances while the system is online.

Best,
-Will

Hi,

I have issues too many open connection on port 11210.(30150 / 30000)

  1. How can I increase the open connection?
  2. How to clean the connection?