Deleting node on a cluster using replications causes problems which i can't recover from
Hi,
when i perform operations on a cluster with 2 nodes and 1 replica, such as delete node,
it causes problems i can't recover.
One of the nodes keeps exiting, and even uninstal does not work,
only reformat helps.
Any ideas why this happens?
Is there some limitations on this?
thanks, Michal
I have 2 couchbase nodes: running on centos
i use one bucket and replication.
From what i saw - this happens only with replication!!
At first i thought it had to do with me deleting the node in order to perform restore
But then it happened after i performed:
1. flush bucket
2. delete bucket
3. rebalance
the 2 nodes i had started exiting, then 1 node raised, and the other continued exiting
i tried:
4. restart of node
5. uninstall
But the uninstall does not remove the data, which i thought was some how corrupted.
In the meantime - i need to perform a long run test and i can't test this since every time this happens
i need to reformat the machine.
Sorry, i can't give more information.
let me know if you have any ideas
Some conclusions:
I ran another test:
1. flush data
2. restart nodes
And what happened was that the port 11210 was occupied and this is why the nodes failed to raise.
I saw that there are 2 processes which are not stopped while stopping the coucnbase:
/opt/couchbase/bin/memcached and /opt/couchbase/lib/erlang/erts-5.8.4/bin/epmd
question: >>>> any ideas why they are not stopped?
So i killed both and then the errors stopped.
Though now, the bucket seems to work on background since the nodes are on status 'pending' for a long time.
Even though i flushed the data.
question: >> so how can i delete data?? deleting node can cause problems, flush does not actually remove data and restarting probably extracts data from disk some how (even though i still don't see the items raise)
I am sorry that the questions are confused, but i am a little confused with what is going on.
thanks, Michal
Can you provide a some more details? Which OS are you using? What were the exact steps that caused this issue? And this shouldn't be happening.