Nodes offline under nominal load
I have a small Membase cluster I'm playing with.
The cluster has 1 membase bucket ("default") configured with 1 replica and per-node RAM quota of 128MB.
Node 1: 2 CPU / 12GB RAM (RHEL5 VM)
Node 2: 24 CPU / 24GB RAM (RHEL5)
Node 3: 24 CPU / 48GB RAM (RHEL6)
I have a small Ruby script that sets objects. My keys are predictably named "1" .. "n".
Frequently while setting objects, nodes seem to randomly go offline. For example, I inserted 2000 32KB objects and took these screenshots while doing it:
This behavior is very repeatable for me. I can insert 10,000 1KB objects or 1,000 8KB objects, really any variation...eventually nodes go offline.
In some cases, the server will return a "temporary failure", and when I receive that ServerError exception I have my client back-off for a few seconds and then try again. This behavior ("dear client, please leave me alone for a second") is expected according to the documentation however the nodes going offline randomly seems unusual to me.
When the nodes go offline, they do eventually come back up. The processes themselves never die, the cluster just perceives the nodes are offline transiently.
I guess my questions are...
1) Am I doing something wrong?
2) How can I investigate this more? (Are there more detailed log files I can review besides those logs in the web interface?)
Thanks in advance!