Diagnostics for memcached running hot (%100 CPU) 24 hrs after last read/write op following a "Hard Out of Memory Error".
Greetings,
As I posted yesterday in http://couchbase.org/forums/thread/default-bucket-has-hard-out-memory-error, we experienced a "Hard Out of Memory Error". Since I was in evaluation mode, I simply stopped using the membase cluster.
Interestingly, 24 hours after all external activity stopped, memcached is still using a full CPU on the node that reported errors. The other two nodes in the cluster are basically idle.
Is there any diagnostics I can do on this memcahced process before I pull this node out of the cluster?
Though this is a significant setback in our evaluation of membase, I am trying to turn this into a 'cluster care and feeding howto' opportunity!
Any suggestions trouble shooting, best practices on recovering from the "Hard Out of Memory Error" situation, etc.?
There is more detail in the other thread or I am happy to provide it here or add more information as needed.
Thanks!
Is it the memcached process itself that's stuck in a tight loop?
What might be useful is mbstats a couple of intervals apart. Just regular mbstats and mbstats timings.