confusion on real meaning of some couchbase stats
During a few minutes of high IO wait load spikes on couchbase nodes I see that the following stats drop to ZERO
While "total amount of operations" does not drop during these spikes and I see no errors on php memcached client side when writing to couchbase. Note that load is write(set/update) only (we are in real write workload testing of real production data) so no reads/gets happen yet, which means all "total amount of operations" are writes.
So what do these 6 stats really mean?
I though they tell the number of writes (creates/updates) ops in RAM on bucket items, and if
writes to bucket in RAM drop to ZERO then it means that all write operations fail and I should see errors on client side like "unable to write to memcache".
Note this is on EC2 medium instances.
Then I tried with EC2 small instance - I observed the same behaviour with difference that during the higo IO load spikes while the single cpu in small instance was in 100% wait - some small percentage of the write operations did failed on client side with "unable to write to memcache". So I though that since the cpu is 100% on IO - if i add more cpus then this will not happen and the 6 stats will not drop to ZERO. So I switched to medium instances and surprisingly I see that client side write operations do not fail anymore but the stats still drop to zero.
PS. I also noted that the IO writes are 100% totally random - no write operation merges happen at all on disk level - which is amazingly rare. Is this of how sqlite writes are done - AFAIK sqlite does have a commit log where writes are done sequentially - so what I see is very strange.