Our production environment consists of a 2-server clustered environment.
In each server there is a tomcat application server with web applications hosted on it and a couchbase server. Our applications store key-value pairs on one of the the two servers of the couchbase cluster
(couchbase is setup as membase and we use spymemcached to connect to it).
Yesterday we had a major crash in our environment and both physical servers where down. After some hours we managed to startup the one of the two physcical servers. Then we noticed that the requests sent to the server that was still down couldn’t be handled from couchbase and we performed a failover which fixed the issue.
Today we managed to start the failed server and we performed the following actions in the couchbase cluster:
- Add the server again in the cluster
- Select the Rebalancing option.
The rebalance process hasn’t completed yet after 4 hours and in the Couchbase Cluster Overview there is a percentage indication that says 0.56% in the first server and 1.15% in the second. This indication
is the same and hasn’t changed these 4 hours.
Also, all the traffic is handled by one of the two couchbase servers (the server that was running since yesterday) and its RAM usage is up to 98%.
Do you think there is a problem with the couchbase environment especially since it had a hard crash when server went down?