{"id":15552,"date":"2024-04-02T13:09:34","date_gmt":"2024-04-02T20:09:34","guid":{"rendered":"https:\/\/www.couchbase.com\/blog\/?p=15552"},"modified":"2024-04-18T08:53:41","modified_gmt":"2024-04-18T15:53:41","slug":"file-transfer-index-rebalance","status":"publish","type":"post","link":"https:\/\/www.couchbase.com\/blog\/file-transfer-index-rebalance\/","title":{"rendered":"Rebalance Reimagined: Faster Scaling of Couchbase&#8217;s Index Service With File Transfers"},"content":{"rendered":"<p><span style=\"font-weight: 400\">Faster scaling of database resources is essential for maintaining efficient and performant databases, especially with the increased pressure of data ingestion, growing query demands, and the need to handle failovers seamlessly. As application-driven query traffic is primarily handled by index services, faster scaling of index rebalance services is critical to highly performant applications.<\/span><\/p>\n<p><span style=\"font-weight: 400\">For the index service, the scaling operation (also referred to as <\/span><a href=\"https:\/\/docs.couchbase.com\/server\/current\/learn\/clusters-and-availability\/rebalance.html#rebalance-and-other-services\"><span style=\"font-weight: 400\">rebalance<\/span><\/a><span style=\"font-weight: 400\">) involves moving individual indexes\/replicas\/partitions among the available index service nodes in the cluster. The aim is to minimize load imbalances and optimize resource utilization metrics, like CPU and memory, across all nodes.<\/span><\/p>\n<p><span style=\"font-weight: 400\">This article explores the limitations and improvements made to the index rebalance process in Couchbase-Server Version 7.6. It introduces a new rebalance flow based on efficient file transfers, offering significant benefits such as substantial reductions in rebalance time and optimized resource utilization, including lower CPU and memory consumption.<\/span><\/p>\n<p>&nbsp;<\/p>\n<h2><span style=\"font-weight: 400\">Overview of index service rebalance<\/span><\/h2>\n<p><span style=\"font-weight: 400\">At a high level, index service rebalance operates in 3 phases:<\/span><\/p>\n<h3><span style=\"font-weight: 400\">Planning<\/span><\/h3>\n<ol>\n<li style=\"list-style-type: none\">\n<ol>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">In the <a href=\"https:\/\/www.couchbase.com\/blog\/index-planner-for-global-secondary-indexes\/\">planning phase<\/a>, the information of all indexes across all nodes in the cluster is gathered, along with their load statistics. Load of an index is derived from various factors like CPU utilization, memory utilization, scan rate, mutation processing rate, disk utilization, etc.<\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">An optimization algorithm is used to minimize load variance across the cluster. For example, if an index service node experiences significantly higher mutation and scan traffic compared to others, the algorithm redistributes indexes to balance the overall load and reduce variance.<\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">The optimization algorithm decides the index movements from their existing nodes to new nodes in a simulated environment to arrive at a distribution that minimizes the load variance in the cluster.<\/span><\/li>\n<\/ol>\n<\/li>\n<\/ol>\n<h3><span style=\"font-weight: 400\">Execution phase<\/span><\/h3>\n<ol>\n<li style=\"list-style-type: none\">\n<ol>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Based on the final plan decided in the planning phase, indexes are moved from their existing nodes to new targets.<\/span><\/li>\n<li style=\"font-weight: 400\">This phase is the most significant contributor to index rebalance time, directly impacted by the movement method and the number of indexes involved.<\/li>\n<\/ol>\n<\/li>\n<\/ol>\n<h3><span style=\"font-weight: 400\">Dropping indexes on source nodes<\/span><\/h3>\n<ol>\n<li style=\"list-style-type: none\">\n<ol>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Once the indexes on the new target nodes are fully rebuilt and ready to handle queries, all incoming scan requests for those indexes will be redirected to the new nodes.<\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">The indexes on the existing source nodes will be removed once the corresponding indexes on the new nodes are ready to serve scan traffic.<\/span><\/li>\n<\/ol>\n<\/li>\n<\/ol>\n<p><span style=\"font-weight: 400\">The <em>Execution phase<\/em>, currently relies on rebuilding the indexes that are being moved by re-streaming and re-processing all the documents from the data service. While this scheme works well functionally, it has these drawbacks:<\/span><\/p>\n<ul>\n<li style=\"list-style-type: none\">\n<ul>\n<li style=\"font-weight: 400\">Increased index rebalance times<\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Slower scans\u00a0<\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Slower mutation processing<\/span><\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<p><span style=\"font-weight: 400\">These drawbacks are mainly due to:<\/span><\/p>\n<p><strong>Resource overheads during rebalance<\/strong><\/p>\n<ul>\n<li style=\"list-style-type: none\">\n<ul>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">The data service has to restream all the documents relevant to the index for the purpose of rebuilding the indexes. Often, restreaming all the data requires backfilling from disk, which can lead to additional pressure on disk<\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">The projector process has to reprocess all the documents to extract data relevant to this index and this can take up additional CPU and memory. These overheads can also create resource contention for incremental traffic to existing indexes.<\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">For the indexer process as well, index builds add considerable resource overheads on CPU, memory, and disk I\/O due to the new incoming mutation traffic. In general, random I\/O read operations are required during certain stages of index building which slows down the entire index build pipeline.<\/span><\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<p><strong>Working set disruption\u00a0<\/strong><\/p>\n<ul>\n<li style=\"list-style-type: none\">\n<ul>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">The memory requirements due to Index rebuild during reduce the memory available for existing indexes<\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">The working set of existing indexes may be evicted out of memory<\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">This causes scans and mutation processing to slow down if the relevant data is not available in memory and needs to be fetched from disk<\/span><\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<h2><span style=\"font-weight: 400\">File transfer rebalance<\/span><\/h2>\n<p><span style=\"font-weight: 400\">File transfer rebalancing is one approach to reduce the overhead of rebuilding indexes. Instead of rebuilding indexes, the source node directly transfers indexed data files to the target node without interacting with the data service. Once data transfer is complete, the index service will catch up with mutations that occurred during the transfer by streaming them from the data service. Scans and mutations will continue to be processed for existing indexes irrespective of the status of transfer.<\/span><\/p>\n<p><span style=\"font-weight: 400\">The data transfer between source and target nodes occurs via a custom protocol built on top of the HTTP request-response model. The source node contains a client capable of sequentially reading snapshotted indexed data from disk and publishing the data to the target node as multiple small, binary blobs. The data transfer is always encrypted. The target node hosts a server that receives these binary blobs, decrypts them and reconstructs the indexed data during the transfer process. All data received by the server on the target node is persistently stored onto disk.<\/span><\/p>\n<p><span style=\"font-weight: 400\">Upon data transfer completion, the target node recovers the index from disk and rolls back to a valid recovery point available within the transferred data. It then requests the data service to stream mutations that occurred since the generation of that recovery point. Once all necessary mutations are processed, the index becomes <em>scan-ready<\/em> on the target node and is subsequently dropped from the source node.<\/span><\/p>\n<p><span style=\"font-weight: 400\">Since most indexed data is directly transferred, bypassing the need for rebuilding, the overhead associated with index rebuilds is eliminated during data transfer. This has significantly improved rebalance speed while reducing associated CPU and memory consumption. As a result, the impact on the working set is also minimal, allowing scans and mutation traffic to be processed at similar rates as before the rebalance.<\/span><\/p>\n<p><span style=\"font-weight: 400\">The only resource overheads incurred with file transfers are those related to data transfer itself and catching up on any mutations that may have occurred during the transfer process. The resource utilization during the data transfer and recovery process can be broken down as follows:<\/span><\/p>\n<ul>\n<li style=\"list-style-type: none\">\n<ul>\n<li><b>Disk read bandwidth (source node)<\/b> &#8211; <span style=\"font-weight: 400\">Efficient, due to sequential reads.<\/span><\/li>\n<li><b>CPU and memory (source node)<\/b> &#8211; <span style=\"font-weight: 400\">Minimal, due to publishing small binary blobs.<\/span><\/li>\n<li><b>CPU and memory (target node)<\/b> &#8211; <span style=\"font-weight: 400\">Minimal, due to processing small binary blobs.<\/span><\/li>\n<li><b>Disk write bandwidth (target node)<\/b> &#8211; <span style=\"font-weight: 400\">Efficient, due to primarily sequential writes.<\/span><\/li>\n<li><b>Disk read bandwidth (recovery, target node) <\/b>&#8211; <span style=\"font-weight: 400\">Minimal, as only a portion of data needs recovery.<\/span><\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<p><span style=\"font-weight: 400\">To minimize disruptions to critical scan and mutation operations, both source and target nodes restrict disk bandwidth to a configurable value, defaulting to 200MB\/sec, during rebalance. This empirically chosen rate ensures balanced efficiency and minimal performance impact, with observed resource consumption of less than 2 CPU cores and tens of MB of memory per node.\u00a0<\/span><\/p>\n<h2><span style=\"font-weight: 400\">Performance results with file transfer rebalance<\/span><\/h2>\n<h3>Benchmark setup<\/h3>\n<ul>\n<li style=\"list-style-type: none\">\n<ul>\n<li style=\"font-weight: 400\"><b>Cluster setup:<\/b>\n<ul>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Data service nodes: 4<\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Index nodes: 3<\/span><\/li>\n<\/ul>\n<\/li>\n<li style=\"font-weight: 400\"><b>Data:<\/b>\n<ul>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Volume: 1 Billion documents<\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Avg. doc size: 230 bytes<\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Distribution: Shared across 2 collections in a single bucket<\/span><\/li>\n<\/ul>\n<\/li>\n<li style=\"font-weight: 400\"><b>Indexes:<\/b>\n<ul>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Type: Partitioned<\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Replication: 1 replica per index<\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Number of partitions: 3 per index<\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Total instances: 18 (3 indexes * 3 partitions * 2 replicas)<\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Avg. secondary index field size: 140 bytes<\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Total disk usage: ~710GB (across all instances)<\/span><\/li>\n<\/ul>\n<\/li>\n<li style=\"font-weight: 400\"><b>Index Service Resources:<\/b>\n<ul>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Memory quota: 128GB per node<\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">CPU cores: 80 per node<\/span><\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<h3>Comparison<\/h3>\n<p><span style=\"font-weight: 400\">This performance benchmark compares two rebalance methods for rebalance times, CPU, and memory utilization for the swap rebalance case:<\/span><\/p>\n<ol>\n<li style=\"list-style-type: none\">\n<ol>\n<li style=\"font-weight: 400\"><b>Index Rebuild Rebalance (DCP Rebalance) <\/b>&#8211; <span style=\"font-weight: 400\">Rebuilds indexes from scratch during rebalancing.<\/span><\/li>\n<li style=\"font-weight: 400\"><b>File Transfer Rebalance<\/b> &#8211; <span style=\"font-weight: 400\">Directly transfers existing index data files between nodes.<\/span><\/li>\n<\/ol>\n<\/li>\n<\/ol>\n<h3><span style=\"font-weight: 400\">Rebalance-swap<\/span><\/h3>\n<ul>\n<li style=\"list-style-type: none\">\n<ul>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">We begin with 2 index nodes in the cluster, each holding around 355GB of indexed data across 9 index instances on each node.<\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">We add a new index node and remove an existing one.<\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">All indexed data from the removed node is transferred to the newly added node.<\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Scan traffic continues to hit the initial nodes until the data transfer is complete.<\/span><\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<p><a href=\"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/2024\/04\/image3.png\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-large wp-image-15553\" src=\"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/2024\/04\/image3-1024x633.png\" alt=\"\" width=\"900\" height=\"556\" srcset=\"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/1\/2024\/04\/image3-1024x633.png 1024w, https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/1\/2024\/04\/image3-300x186.png 300w, https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/1\/2024\/04\/image3-768x475.png 768w, https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/1\/2024\/04\/image3.png 1200w\" sizes=\"auto, (max-width: 900px) 100vw, 900px\" \/><\/a><\/p>\n<h4>Swap rebalance time comparison<\/h4>\n<p><span style=\"font-weight: 400\">File transfer rebalance completed in just <\/span><b>37 minutes<\/b><span style=\"font-weight: 400\">, compared to a staggering <\/span><b>272 minutes<\/b><span style=\"font-weight: 400\"> with the traditional DCP rebalance method. This represents a <\/span><b>7x improvement<\/b><span style=\"font-weight: 400\"> in speed! This efficiency is largely due to the direct transfer of data files, instead of rebuilding them entirely. While the data transfer itself would theoretically take around <\/span><b>29.6 minutes<\/b><span style=\"font-weight: 400\"> (assuming a sustained 200MB\/sec), the overall rebalance time aligns well with the actual end-to-end rebalance time (including planning and catchup phases).<\/span><\/p>\n<p><a href=\"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/2024\/04\/image2.png\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-large wp-image-15554\" src=\"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/2024\/04\/image2-1024x595.png\" alt=\"\" width=\"900\" height=\"523\" srcset=\"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/1\/2024\/04\/image2-1024x595.png 1024w, https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/1\/2024\/04\/image2-300x174.png 300w, https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/1\/2024\/04\/image2-768x446.png 768w, https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/1\/2024\/04\/image2-1536x892.png 1536w, https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/1\/2024\/04\/image2-1320x767.png 1320w, https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/1\/2024\/04\/image2.png 1942w\" sizes=\"auto, (max-width: 900px) 100vw, 900px\" \/><\/a><\/p>\n<h4>Rebalance CPU comparison<\/h4>\n<p><b>DCP rebalance consumes significantly more CPU resources compared to file transfer rebalance.<\/b><span style=\"font-weight: 400\"> This is because the indexer process needs to rebuild all the mutations streamed from the data service, which is a computationally intensive process. In contrast, file transfer rebalance has very less CPU consumption with occasional spikes in CPU usage during the <em>catchup phase<\/em>\u00a0after the data transfer.<\/span><\/p>\n<p><a href=\"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/2024\/04\/image1.png\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-large wp-image-15555\" src=\"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/2024\/04\/image1-1024x581.png\" alt=\"\" width=\"900\" height=\"511\" srcset=\"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/1\/2024\/04\/image1-1024x581.png 1024w, https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/1\/2024\/04\/image1-300x170.png 300w, https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/1\/2024\/04\/image1-768x436.png 768w, https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/1\/2024\/04\/image1-1536x872.png 1536w, https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/1\/2024\/04\/image1-1320x749.png 1320w, https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/1\/2024\/04\/image1.png 1999w\" sizes=\"auto, (max-width: 900px) 100vw, 900px\" \/><\/a><\/p>\n<h4>Rebalance memory comparison<\/h4>\n<p><b>DCP rebalance also demands much more memory compared to file transfer rebalance.<\/b><span style=\"font-weight: 400\"> During the rebuild process, the indexer process needs to constantly allocate and manage memory for all the incoming mutations, leading to significant strain on available resources. However, file transfer rebalance operates differently. Since it directly transfers data files to disk instead of rebuilding everything in memory, it only requires minimal memory for processing, significantly reducing overall memory demands.<\/span><\/p>\n<h4>Rebalance time disk I\/O utilization<\/h4>\n<p><a href=\"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/2024\/04\/image4.png\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-large wp-image-15556\" src=\"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/2024\/04\/image4-1024x586.png\" alt=\"\" width=\"900\" height=\"515\" srcset=\"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/1\/2024\/04\/image4-1024x586.png 1024w, https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/1\/2024\/04\/image4-300x172.png 300w, https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/1\/2024\/04\/image4-768x439.png 768w, https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/1\/2024\/04\/image4-1536x879.png 1536w, https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/1\/2024\/04\/image4-1320x755.png 1320w, https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/1\/2024\/04\/image4.png 1986w\" sizes=\"auto, (max-width: 900px) 100vw, 900px\" \/><\/a><\/p>\n<p><b>File transfer rebalance maintains a steady disk I\/O rate throughout the process, thanks to the controlled transfer speed.<\/b><span style=\"font-weight: 400\"> This throttling ensures balanced efficiency and minimizes performance impact. Occasional spikes might occur during the <em>catchup phase<\/em>\u00a0where mutations are processed, but overall, disk I\/O remains stable. In contrast, DCP rebalance suffers from near-constant disk I\/O saturation due to its rebuild-heavy approach, potentially leading to performance bottlenecks.<\/span><\/p>\n<p>&nbsp;<\/p>\n<h4>Rebalance transfer bandwidth on source node<\/h4>\n<p><a href=\"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/2024\/04\/image5.png\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-large wp-image-15557\" src=\"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/2024\/04\/image5-1024x593.png\" alt=\"\" width=\"900\" height=\"521\" srcset=\"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/1\/2024\/04\/image5-1024x593.png 1024w, https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/1\/2024\/04\/image5-300x174.png 300w, https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/1\/2024\/04\/image5-768x444.png 768w, https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/1\/2024\/04\/image5-1536x889.png 1536w, https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/1\/2024\/04\/image5-1320x764.png 1320w, https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/1\/2024\/04\/image5.png 1984w\" sizes=\"auto, (max-width: 900px) 100vw, 900px\" \/><\/a><\/p>\n<p><span style=\"font-weight: 400\">While DCP leverages a <em>pull<\/em>\u00a0approach where target nodes retrieve data directly, file transfer relies on the <\/span><b>source node actively pushing data outwards<\/b><span style=\"font-weight: 400\">. This results in a <\/span><b>markedly higher <em>out_bytes_per_second<\/em>\u00a0metric<\/b><span style=\"font-weight: 400\"> (data pushed out) for the source node during file transfer rebalance. In DCP rebalance, this metric dips close to zero if no scans are actively running, as data pulls only occur on demand.<\/span><\/p>\n<h4>Other rebalance time comparisons<\/h4>\n<p><span style=\"font-weight: 400\">We previously discussed the impressive time savings achieved during rebalance swap operations (moving data directly between nodes being added and removed). We&#8217;re happy to report similar gains for two other scenarios: <\/span><b>rebalance-in<\/b><span style=\"font-weight: 400\"> (adding new index nodes and redistributing indexes to them) and <\/span><b>rebalance-out<\/b><span style=\"font-weight: 400\"> (removing index nodes and redistributing their indexes to remaining nodes). The table below summarizes the overall rebalance time improvements observed in a setup similar to the one described earlier.<\/span><\/p>\n<table>\n<tbody>\n<tr>\n<td><b>Rebalance type<\/b><\/td>\n<td><b>DCP rebalance time (min)<\/b><\/td>\n<td><b>File transfer rebalance time (min)<\/b><\/td>\n<\/tr>\n<tr>\n<td><span style=\"font-weight: 400\">Rebalance-in (2 nodes \u2192 3 nodes)<\/span><\/td>\n<td><span style=\"font-weight: 400\">123.6 min<\/span><\/td>\n<td><span style=\"font-weight: 400\">12.3 min<\/span><\/td>\n<\/tr>\n<tr>\n<td><span style=\"font-weight: 400\">Rebalance-out (3 nodes \u2192 2 nodes)<\/span><\/td>\n<td><span style=\"font-weight: 400\">144.7 min<\/span><\/td>\n<td><span style=\"font-weight: 400\">36.3 min<\/span><\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<h2><span style=\"font-weight: 400\">Enabling file transfer rebalance<\/span><\/h2>\n<p><span style=\"font-weight: 400\">File transfer rebalance is enabled by default on Capella deployments. For self-hosted deployments, it has to be manually enabled by the end user (either from the UI or using command line requests). More details are available in the <a href=\"https:\/\/docs.couchbase.com\/server\/current\/learn\/clusters-and-availability\/rebalance.html#index-rebalance-methods\">rebalancing documentation<\/a>.<\/span><\/p>\n<h2><span style=\"font-weight: 400\">Summary<\/span><\/h2>\n<p><span style=\"font-weight: 400\">The traditional Couchbase Index Service rebalance method suffers from high resource usage and long rebalance times due to index rebuilding. The new file transfer rebalance tackles this problem by directly transferring data files between nodes, significantly reducing resource overhead (CPU, memory, disk I\/O) and rebalance times. The index rebalance times have improved by up to 7 times in some cases, like rebalance swap. This translates to faster scaling, improved application performance, and more efficient cluster resource utilization.<\/span><\/p>\n<ul>\n<li style=\"list-style-type: none\">\n<ul>\n<li>Read about more <a href=\"https:\/\/www.couchbase.com\/blog\/couchbase-server-7-6-top-developer-features\/\">Couchbase Server 7.6 new features<\/a>.<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>Faster scaling of database resources is essential for maintaining efficient and performant databases, especially with the increased pressure of data ingestion, growing query demands, and the need to handle failovers seamlessly. As application-driven query traffic is primarily handled by index [&hellip;]<\/p>\n","protected":false},"author":85151,"featured_media":15556,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"inline_featured_image":false,"footnotes":""},"categories":[1821,1816,9417,9381],"tags":[9945,2126,1696,9662],"ppma_author":[9948],"class_list":["post-15552","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-couchbase-architecture","category-couchbase-server","category-performance","category-indexing","tag-couchbase-7-6","tag-high-availability","tag-indexing","tag-rebalance"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO Premium plugin v25.8 (Yoast SEO v25.8) - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Index Rebalance Reimagined: Faster Scaling with File Transfers<\/title>\n<meta name=\"description\" content=\"Faster scaling of database resources is essential for maintaining efficient and performant databases. Learn how Couchbase&#039;s index service hits the mark.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.couchbase.com\/blog\/file-transfer-index-rebalance\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Rebalance Reimagined: Faster Scaling of Couchbase&#039;s Index Service With File Transfers\" \/>\n<meta property=\"og:description\" content=\"Faster scaling of database resources is essential for maintaining efficient and performant databases. Learn how Couchbase&#039;s index service hits the mark.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.couchbase.com\/blog\/file-transfer-index-rebalance\/\" \/>\n<meta property=\"og:site_name\" content=\"The Couchbase Blog\" \/>\n<meta property=\"article:published_time\" content=\"2024-04-02T20:09:34+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2024-04-18T15:53:41+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/2024\/04\/image4.png\" \/>\n\t<meta property=\"og:image:width\" content=\"1986\" \/>\n\t<meta property=\"og:image:height\" content=\"1136\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"Varun Velamuri, Principal Engineer\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Varun Velamuri, Principal Engineer\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"10 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/www.couchbase.com\/blog\/file-transfer-index-rebalance\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/www.couchbase.com\/blog\/file-transfer-index-rebalance\/\"},\"author\":{\"name\":\"Varun Velamuri, Principal Engineer\",\"@id\":\"https:\/\/www.couchbase.com\/blog\/#\/schema\/person\/b0f1e943ab5487ba18ca3d2ba134266d\"},\"headline\":\"Rebalance Reimagined: Faster Scaling of Couchbase&#8217;s Index Service With File Transfers\",\"datePublished\":\"2024-04-02T20:09:34+00:00\",\"dateModified\":\"2024-04-18T15:53:41+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/www.couchbase.com\/blog\/file-transfer-index-rebalance\/\"},\"wordCount\":1858,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\/\/www.couchbase.com\/blog\/#organization\"},\"image\":{\"@id\":\"https:\/\/www.couchbase.com\/blog\/file-transfer-index-rebalance\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/1\/2024\/04\/image4.png\",\"keywords\":[\"Couchbase 7.6\",\"High Availability\",\"Indexing\",\"rebalance\"],\"articleSection\":[\"Couchbase Architecture\",\"Couchbase Server\",\"High Performance\",\"Indexing\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/www.couchbase.com\/blog\/file-transfer-index-rebalance\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/www.couchbase.com\/blog\/file-transfer-index-rebalance\/\",\"url\":\"https:\/\/www.couchbase.com\/blog\/file-transfer-index-rebalance\/\",\"name\":\"Index Rebalance Reimagined: Faster Scaling with File Transfers\",\"isPartOf\":{\"@id\":\"https:\/\/www.couchbase.com\/blog\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/www.couchbase.com\/blog\/file-transfer-index-rebalance\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/www.couchbase.com\/blog\/file-transfer-index-rebalance\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/1\/2024\/04\/image4.png\",\"datePublished\":\"2024-04-02T20:09:34+00:00\",\"dateModified\":\"2024-04-18T15:53:41+00:00\",\"description\":\"Faster scaling of database resources is essential for maintaining efficient and performant databases. Learn how Couchbase's index service hits the mark.\",\"breadcrumb\":{\"@id\":\"https:\/\/www.couchbase.com\/blog\/file-transfer-index-rebalance\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/www.couchbase.com\/blog\/file-transfer-index-rebalance\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.couchbase.com\/blog\/file-transfer-index-rebalance\/#primaryimage\",\"url\":\"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/1\/2024\/04\/image4.png\",\"contentUrl\":\"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/1\/2024\/04\/image4.png\",\"width\":1986,\"height\":1136},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/www.couchbase.com\/blog\/file-transfer-index-rebalance\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/www.couchbase.com\/blog\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Rebalance Reimagined: Faster Scaling of Couchbase&#8217;s Index Service With File Transfers\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/www.couchbase.com\/blog\/#website\",\"url\":\"https:\/\/www.couchbase.com\/blog\/\",\"name\":\"The Couchbase Blog\",\"description\":\"Couchbase, the NoSQL Database\",\"publisher\":{\"@id\":\"https:\/\/www.couchbase.com\/blog\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/www.couchbase.com\/blog\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/www.couchbase.com\/blog\/#organization\",\"name\":\"The Couchbase Blog\",\"url\":\"https:\/\/www.couchbase.com\/blog\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.couchbase.com\/blog\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/2023\/04\/admin-logo.png\",\"contentUrl\":\"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/2023\/04\/admin-logo.png\",\"width\":218,\"height\":34,\"caption\":\"The Couchbase Blog\"},\"image\":{\"@id\":\"https:\/\/www.couchbase.com\/blog\/#\/schema\/logo\/image\/\"}},{\"@type\":\"Person\",\"@id\":\"https:\/\/www.couchbase.com\/blog\/#\/schema\/person\/b0f1e943ab5487ba18ca3d2ba134266d\",\"name\":\"Varun Velamuri, Principal Engineer\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.couchbase.com\/blog\/#\/schema\/person\/image\/33d7a58673f0e651f3dd64e3fe93b49b\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/fd340cfb3ff99be3535d24bae5066350444b6c53c1b95a451d984449ef6ab464?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/fd340cfb3ff99be3535d24bae5066350444b6c53c1b95a451d984449ef6ab464?s=96&d=mm&r=g\",\"caption\":\"Varun Velamuri, Principal Engineer\"},\"description\":\"Varun Velamuri is a Principal Engineer in the Global Secondary Indexing team at Couchbase. He has experience in working on technologies related to concurrent programming, parallel and distributed systems, distributed databases, performance optimisations etc. Prior to Couchbase, he has worked as a Lead Research Engineer in the Parallel Systems Laboratories at Siemens Research, Bangalore focussing on technologies related concurrent and parallel programming, correctness tools for multi-threaded programming, distributed event processing etc.\",\"url\":\"https:\/\/www.couchbase.com\/blog\/author\/varun\/\"}]}<\/script>\n<!-- \/ Yoast SEO Premium plugin. -->","yoast_head_json":{"title":"Index Rebalance Reimagined: Faster Scaling with File Transfers","description":"Faster scaling of database resources is essential for maintaining efficient and performant databases. Learn how Couchbase's index service hits the mark.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.couchbase.com\/blog\/file-transfer-index-rebalance\/","og_locale":"en_US","og_type":"article","og_title":"Rebalance Reimagined: Faster Scaling of Couchbase's Index Service With File Transfers","og_description":"Faster scaling of database resources is essential for maintaining efficient and performant databases. Learn how Couchbase's index service hits the mark.","og_url":"https:\/\/www.couchbase.com\/blog\/file-transfer-index-rebalance\/","og_site_name":"The Couchbase Blog","article_published_time":"2024-04-02T20:09:34+00:00","article_modified_time":"2024-04-18T15:53:41+00:00","og_image":[{"width":1986,"height":1136,"url":"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/2024\/04\/image4.png","type":"image\/png"}],"author":"Varun Velamuri, Principal Engineer","twitter_card":"summary_large_image","twitter_misc":{"Written by":"Varun Velamuri, Principal Engineer","Est. reading time":"10 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.couchbase.com\/blog\/file-transfer-index-rebalance\/#article","isPartOf":{"@id":"https:\/\/www.couchbase.com\/blog\/file-transfer-index-rebalance\/"},"author":{"name":"Varun Velamuri, Principal Engineer","@id":"https:\/\/www.couchbase.com\/blog\/#\/schema\/person\/b0f1e943ab5487ba18ca3d2ba134266d"},"headline":"Rebalance Reimagined: Faster Scaling of Couchbase&#8217;s Index Service With File Transfers","datePublished":"2024-04-02T20:09:34+00:00","dateModified":"2024-04-18T15:53:41+00:00","mainEntityOfPage":{"@id":"https:\/\/www.couchbase.com\/blog\/file-transfer-index-rebalance\/"},"wordCount":1858,"commentCount":0,"publisher":{"@id":"https:\/\/www.couchbase.com\/blog\/#organization"},"image":{"@id":"https:\/\/www.couchbase.com\/blog\/file-transfer-index-rebalance\/#primaryimage"},"thumbnailUrl":"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/1\/2024\/04\/image4.png","keywords":["Couchbase 7.6","High Availability","Indexing","rebalance"],"articleSection":["Couchbase Architecture","Couchbase Server","High Performance","Indexing"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/www.couchbase.com\/blog\/file-transfer-index-rebalance\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/www.couchbase.com\/blog\/file-transfer-index-rebalance\/","url":"https:\/\/www.couchbase.com\/blog\/file-transfer-index-rebalance\/","name":"Index Rebalance Reimagined: Faster Scaling with File Transfers","isPartOf":{"@id":"https:\/\/www.couchbase.com\/blog\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.couchbase.com\/blog\/file-transfer-index-rebalance\/#primaryimage"},"image":{"@id":"https:\/\/www.couchbase.com\/blog\/file-transfer-index-rebalance\/#primaryimage"},"thumbnailUrl":"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/1\/2024\/04\/image4.png","datePublished":"2024-04-02T20:09:34+00:00","dateModified":"2024-04-18T15:53:41+00:00","description":"Faster scaling of database resources is essential for maintaining efficient and performant databases. Learn how Couchbase's index service hits the mark.","breadcrumb":{"@id":"https:\/\/www.couchbase.com\/blog\/file-transfer-index-rebalance\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.couchbase.com\/blog\/file-transfer-index-rebalance\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.couchbase.com\/blog\/file-transfer-index-rebalance\/#primaryimage","url":"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/1\/2024\/04\/image4.png","contentUrl":"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/1\/2024\/04\/image4.png","width":1986,"height":1136},{"@type":"BreadcrumbList","@id":"https:\/\/www.couchbase.com\/blog\/file-transfer-index-rebalance\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.couchbase.com\/blog\/"},{"@type":"ListItem","position":2,"name":"Rebalance Reimagined: Faster Scaling of Couchbase&#8217;s Index Service With File Transfers"}]},{"@type":"WebSite","@id":"https:\/\/www.couchbase.com\/blog\/#website","url":"https:\/\/www.couchbase.com\/blog\/","name":"The Couchbase Blog","description":"Couchbase, the NoSQL Database","publisher":{"@id":"https:\/\/www.couchbase.com\/blog\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.couchbase.com\/blog\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/www.couchbase.com\/blog\/#organization","name":"The Couchbase Blog","url":"https:\/\/www.couchbase.com\/blog\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.couchbase.com\/blog\/#\/schema\/logo\/image\/","url":"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/2023\/04\/admin-logo.png","contentUrl":"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/2023\/04\/admin-logo.png","width":218,"height":34,"caption":"The Couchbase Blog"},"image":{"@id":"https:\/\/www.couchbase.com\/blog\/#\/schema\/logo\/image\/"}},{"@type":"Person","@id":"https:\/\/www.couchbase.com\/blog\/#\/schema\/person\/b0f1e943ab5487ba18ca3d2ba134266d","name":"Varun Velamuri, Principal Engineer","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.couchbase.com\/blog\/#\/schema\/person\/image\/33d7a58673f0e651f3dd64e3fe93b49b","url":"https:\/\/secure.gravatar.com\/avatar\/fd340cfb3ff99be3535d24bae5066350444b6c53c1b95a451d984449ef6ab464?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/fd340cfb3ff99be3535d24bae5066350444b6c53c1b95a451d984449ef6ab464?s=96&d=mm&r=g","caption":"Varun Velamuri, Principal Engineer"},"description":"Varun Velamuri is a Principal Engineer in the Global Secondary Indexing team at Couchbase. He has experience in working on technologies related to concurrent programming, parallel and distributed systems, distributed databases, performance optimisations etc. Prior to Couchbase, he has worked as a Lead Research Engineer in the Parallel Systems Laboratories at Siemens Research, Bangalore focussing on technologies related concurrent and parallel programming, correctness tools for multi-threaded programming, distributed event processing etc.","url":"https:\/\/www.couchbase.com\/blog\/author\/varun\/"}]}},"authors":[{"term_id":9948,"user_id":85151,"is_guest":0,"slug":"varun","display_name":"Varun Velamuri, Principal Engineer","avatar_url":"https:\/\/secure.gravatar.com\/avatar\/fd340cfb3ff99be3535d24bae5066350444b6c53c1b95a451d984449ef6ab464?s=96&d=mm&r=g","author_category":"","last_name":"Velamuri, Principal Engineer","first_name":"Varun","job_title":"","user_url":"","description":"Varun Velamuri is a Principal Engineer in the Global Secondary Indexing team at Couchbase. He has experience in working on technologies related to concurrent programming, parallel and distributed systems, distributed databases, performance optimisations etc. Prior to Couchbase, he has worked as a Lead Research Engineer in the Parallel Systems Laboratories at Siemens Research, Bangalore focussing on technologies related concurrent and parallel programming, correctness tools for multi-threaded programming, distributed event processing etc."}],"_links":{"self":[{"href":"https:\/\/www.couchbase.com\/blog\/wp-json\/wp\/v2\/posts\/15552","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.couchbase.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.couchbase.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.couchbase.com\/blog\/wp-json\/wp\/v2\/users\/85151"}],"replies":[{"embeddable":true,"href":"https:\/\/www.couchbase.com\/blog\/wp-json\/wp\/v2\/comments?post=15552"}],"version-history":[{"count":0,"href":"https:\/\/www.couchbase.com\/blog\/wp-json\/wp\/v2\/posts\/15552\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.couchbase.com\/blog\/wp-json\/wp\/v2\/media\/15556"}],"wp:attachment":[{"href":"https:\/\/www.couchbase.com\/blog\/wp-json\/wp\/v2\/media?parent=15552"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.couchbase.com\/blog\/wp-json\/wp\/v2\/categories?post=15552"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.couchbase.com\/blog\/wp-json\/wp\/v2\/tags?post=15552"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.couchbase.com\/blog\/wp-json\/wp\/v2\/ppma_author?post=15552"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}