Could writes cause read timeouts?
12 node 1.7.1 cluster, 50GB per node allocated to the cluster for a total of 600GB.
Twice a day we load 160mm keys into a membase bucket with replication enabled at ~50k a second. We have reads going at about 2-15k a second. What we find is that our spymemcached clients start timing out requests but only from one server. What doesn't make sense is that reads coming out of a different bucket also start timing out. Network traffic into each node is only 20mbit so it's not a network issue (we have a gigabit network). On the node that has the timeouts memcached is only using 200% CPU (the other 11 nodes are only at around 70%). We have dual hexacore processors with hyperthreading so we aren't anywhere close to maxing out that server. Our dataset fits completely in memory (although for some weird reason its only showing 97% resident rate even though the bucket is only 40% full).
Once writing of the active items finishes across the cluster (replica writes are still going) the issue stops.
Is this just an issue with vbuckets being migrated? If so, why does it affect other buckets when we are only writing into one bucket? Can we adjust the number of read threads to improve performance? We could throttle the writes down to see if that would alleviate the problem but we shouldn't be anywhere close to stressing the cluster.