Rebalance is a central feature of Couchbase Server cluster management. It enables redistribution of services when nodes are added to or removed from a cluster. This feature can adjust data and indexes, plus query and event processing, among the nodes of a cluster to balance resource usage, including memory and CPU, evenly across nodes. Rebalance is an online operationit runs concurrently with normal cluster workloadsso it is critical that it perform its job in a timely fashion without disrupting production applications.

Couchbase Server 7.1 brings numerous performance improvements and reduced resource usage to Rebalance for the Global Secondary Index Service via a new feature, Smart Batching Index Builds During Rebalance, or “Smart Batching.” Read on to learn more about these Rebalance improvements for the Index Service.

Index Rebalance

Rebalance for the Index Service redistributes whole indexes, replicas, and individual index partitions across all available Index nodes to balance memory and CPU demands evenly across nodes. (This article uses the shorthand “indexes” to refer collectively to indexes, replicas, and partitions.) Index Service Rebalance has two main phases:

  1. Planning
  2. Movement

In the Planning phase, the index Planner uses a simulated annealing algorithm to optimize the desired layout of indexes across nodes based on actual and estimated memory and CPU usage, while taking into account the cost of moving indexes to new homes on other nodes.

The Movement phase commences after the Planner has produced a desired plan for the new index layout and is where the heavy lifting of moving indexes is performed. Note that indexes are not really “moved” in the sense of copying them from one node to another; rather their metadata is moved, and the indexes are then rebuilt at their new locations from streams of indexed data fields from the Data Service. Depending on the number and size of indexes that need to be moved, this phase can span a good deal of wall-clock time and has the potential to put heavy resource demands on the cluster.

The 7.1 Smart Batching feature reduces both the time taken and the amount of resources consumed compared to prior Couchbase Server versions. It also enables greater concurrency than was possible in prior releases if a use case prioritizes reducing running time above minimizing Rebalance resource usage.

Smart Batching

During Index Service Rebalance, the work of moving indexes from their current nodes to their newly planned destinations is done in batches of limited size to avoid too much impact on concurrently running production workloads. The default batch size is three, which is unchanged from prior releases, but see the “Note on Batch Size” section at the end of this article for how an administrator can increase this.

The Smart Batching feature brings numerous improvements to Rebalance for the Index Service:

  1. Increases Rebalance performance by
    • Increasing pipeline parallelism
    • Optionally increasing overall concurrency
  2.  Reduces resource consumption via
    • Optimizing data stream sharing
  3.  Reduces the impact on production workloads by
    • Repairing replicas before moving indexes
    • Reducing concurrency on more heavily loaded Index nodes

 Let’s look briefly at each of these in turn.

Increasing pipeline parallelism 

Increasing pipeline parallelism is accomplished in several ways. First, Smart Batching overlaps batches of index moves. The earlier algorithm had to wait for an entire batch of moves to finish before starting the next batch, creating a “long pole” effect where the longest-running move in each batch delayed starting the next batch; so the overall level of concurrency oscillated up and down in pulses instead of remaining relatively stable. Smart Batching instead starts a new batch of index moves when only a subset of the prior batch has completed, keeping the overall level of concurrency closer to the desired maximum.

Second, Smart Batching overlaps index and partition drops with builds of other indexes. Index “movement” really consists of moving the metadata, building the index on the new node, and then dropping it from the original node. The old algorithm had to wait for all index builds in a batch to complete before starting any of the drops. Smart Batching overlaps the processes of dropping previously built indexes with building new ones.

Finally, Smart Batching moves all deferred indexes concurrently instead of doing so in small batches. Deferred indexes are those that were created as deferred (WITH {“defer_build”: true}) but for which no build command has yet been issued. Only the metadata for these indexes exists, which is very small (roughly the size of the corresponding CREATE INDEX N1QL statement) so they do not need to be built on their destination nodes and thus consume almost no cluster resources to move. Prior releases moved these in small batches, each of which had to complete before the next was started, consuming more wall-clock time.

Increasing overall concurrency

Optionally increasing overall concurrency is an improvement Smart Batching delivers in enabling larger maximum batch sizes, allowing more index moves to be done concurrently. The default batch size of three is unchanged from prior releases; however it is now possible to set the batch size to numbers greater than 10, which was not supported before. As in prior releases, the Index Service will still only create at most 10 data streams at a time for index builds, to avoid overloading either Index or Data nodes, but with stream sharing (described in the next item), more than 10 total indexes can build concurrently while consuming those streams, which was previously unsupported.

Data stream sharing

Optimizing data stream sharing means constructing batches such that multiple index builds will share the same data stream as often as possible. Smart Batching prefers scheduling indexes that can share a stream in the same batch, which reduces the amount of cluster resources, especially CPU, consumed on both Index and Data nodes during the build process. Previously, the subset of indexes moved within a batch was essentially random.

Stream sharing is optimized by preferentially selecting indexes for a batch that will share a stream with indexes already selected for the batch.

    • Partitions of the same index are the most preferred because they will not only share the same data stream but will also require exactly the same set of index keys, minimizing the volume of data fields that need to be streamed.
    • Indexes on the same collection are then preferred if there are not enough partitions of the same index to fill out a batch. These will also share a stream, but in general they will require different keys, so not all the data fields in the stream will be shared by all the indexes consuming it.

Repairing replicas

Repairing replicas before moving indexes means Smart Batching will rebuild index replicas that had been lost due to prior node failovers before scheduling any regular index moves. Since all replicas of a Global Secondary Index are available for both reads and writes, building the missing ones first will more quickly spread production workload across nodes that had become concentrated on fewer replicas due to some being lost. Before Smart Batching, there was no prioritization of replica repairs, so they would be mixed in with other index moves at random. Replica repairs are still batched like other moves, in the new overlapping way already described, because they still require index builds, which are the most resource-intensive aspect of Index Rebalance.

Reducing concurrency 

Reducing concurrency on more heavily loaded Index nodes is the final Rebalance improvement the Smart Batching feature brings to Couchbase Server 7.1. In creating a new batch of work, Smart Batching accounts for the size of indexes already present on each node in deciding whether to prefer building another index on a particular node in the current batch (done for lightly loaded nodes) or, instead, to select an index build for the next node in a round-robin node sequence (done for more heavily loaded nodes).

Building more indexes on the same node increases the chances of stream sharing but also consumes more CPU and memory on that node while the builds are progressing. Nodes are considered lightly loaded if their memory-resident set size (RSS) averages at least 50% (Standard Global Secondary indexes, aka Plasma) or if the total Index Service memory already used on the node is at most half of its assigned memory quota (Memory-Optimized indexes). Prior to the Smart Batching feature, the Rebalance algorithm did not take node load into account.

Smart Batching Support for Mixed-Level Clusters

All Smart Batching benefits also apply to clusters with nodes of different release levels as long as at least one Index node is running Couchbase Server 7.1 or higher, except for the specific pipeline parallelism of overlapping index and partition drops with builds of other indexes.

Note on Batch Size

The Index Rebalance default batch size remains at its historical value of three. While the Couchbase Server UI does not expose this configuration parameter, a user with Couchbase Administrator privileges can change it from the command line on any Index node via:

The change will automatically propagate to all Index nodes and will be remembered across node and cluster restarts.

Conclusion

Couchbase Server 7.1 brings many exciting new improvements to the core Rebalance feature that both increase the speed and reduce the impact of Index Service Rebalances. This article should help familiarize you with the new high-level changes as well as help you understand some of the finer details that go into making Index Rebalance truly smarter in Couchbase Server 7.1.

There are many other features and improvements delivered in this release – read this overview of What’s New in Couchbase Server 7.1 to get up to speed.

Author

Posted by Kevin Cherkauer, Senior Software Engineer, Couchbase

2 Comments

  1. Why not just physically move the indexes instead of rebuilding them? GSI are basically (secondary key, pk) pairs – they would be valid even if you just physically move the files and that would be less resource intensive. What am I missing here?

Leave a reply