Couchbase 4.1 gets stuck rebalancing under load

We’re testing out CB 4.1 Community. The cluster was 6 nodes and we wanted to test adding a few more (3 nodes). Running the rebalance, it got varying percentages of the way through the rebalance, between 72% and 96% and then progress paused and didn’t move for several minutes.

I stopped the rebalance, waited a few minutes and then ran the rebalance again.

Any ideas what would cause the pause or logs we could look at to see why streaming got stuck?

The other question we have is when we stop the rebalance, all nodes are processing reads and writes, but we get a warning that some data is not replicated. When CB streams data, I’m guessing it waits until all the data for a vBucket is transferred before it assigns the new node as the owner of that data. Is that true?

I’m wondering what state the cluster is in after I stopped the rebalance. Is data missing from some nodes so reads would come back with no data even though data might exist or is the cluster unbalanced in that different nodes owner different numbers of vBuckets?

Other details, we had the cluster under reasonable load (50k ops/sec, CPU ~50%, disk util ~90%) to simulate a production workload. There wasn’t much data in the cluster, few GB. This is also running on AWS.