Faster Indexing and Query- part3

In Part I or the series, we have corevered the architecture behind global vs local Indexes and when to use a global (GSI) vs a local index (MapReduce View) index in Couchbase Server. In Part II of the series, we have also talked about the new memory optimized global secondary indexes and how MOI improves the index maintenance performance with an in memory structure that is designed purely for high mutation rates and high scan rates. With Part III, I’d like to tell you all about the how standard global secondary indexes improved in 4.5: There are a number of improvement in this area but the most important advancement is a new Write Mode called “Circular Writes”.

Memory Optimized vs. Standard Global Secondary Indexes

Memory optimized indexes are added in 4.5 as an additional storage option for GSIs. Standard global secondary indexes have been there since version 4.0. Administrators can configure GSI with either the standard GSI storage, which uses ForestDB underneath, for indexes that cannot fit in memory or can pick the memory optimized GSI for faster in-memory indexing and queries. Even though memory optimized indexes, with in-memory index management, can provide the best index maintenance and scan performance, not everyone can efford to have all indexes in memory. Standard GSI can spill to disk when memory runs out, so efficient disk IO is critical to efficient indexing and scans.

Write Modes in Standard Global Secondary Indexes

Previously standard GSI only offered an apend-only write mode. Append Only writes write to the end of the file with every mutation to the index. However append-only writes require frequent compactions. With 4.5 Standard GSI comes with an additional write mode called “circular writes”.

When you enable “circular writes”, as mutations arrive, instead of simply appending new pages to the end of the file, write operations look for reusing the orphaned space in the file. If there is not enough orphaned space available in the file that can accommodate the write, the operation may still do a write with append.

With circular writes, full compaction still operates the same way. The compaction process reads the existing file and writes a new contiguous file that no longer contains the orphaned items, and is written as a contiguous file. However the number of compactions needed are drastically reduced. Instead of compacting every few hours, it can be once a week and that is an amazing savings on the IO capacity (IOPS and MB/sec).

Configuring Write Mode and Compaction Trigger for Standard GSI

Standard GSI comes with 2 write modes. The configuration for write mode and index fragmentation is under settings > auto-fragmentation in the web console. (Note: Fragmentation setting for index only applies when “Standard Global Secondary Index” storage option is selected for indexes. Write mode and compaction strategy does not apply to memory-optimized global secondary indexes.)

Use Circular writes with time interval to trigger compaction: For new clusters created with version 4.5, this option is selected by default. With circular writes, frequent compactions are not necessary. You must specify the days of the week and the start time when compaction is allowed to run and optionally, set an end time of the time period when compaction is aborted. The end time is only in effect if you set the abort compaction option is checked.
Append-only writes with index fragmentation level to trigger compaction: When you upgrade a cluster (with the indexing service enabled) from version 4.0 or 4.1, this option is selected by default. The option is kept mainly for backward compatibility.

You can change between the write modes at any time.

The alerts and stats operate the same way between standard and memory optimized indexes, you can refer to Part II of the series for more information on stats and alerts.

-cihan

Cihan Biyikoglu, Director of Product Management, Couchbase

Author

Posted by Cihan Biyikoglu, Director of Product Management, Couchbase

Cihan Biyikoglu is a director of product management at Couchbase, responsible for the Couchbase Server product. Cihan is a big data enthusiast who brings over twenty years of experience to Redis Labs’ product team. Cihan started his career as a C/C++ developer.

All Posts

Platform

Self-Managed

Services

Capabilities

Why Couchbase?

Migrate to Capella

By Use Case

By Industry

By Application Need

Popular Docs

By Developer Role

Quickstart

Resource Center

About

Partnerships

Our Services

Partners: Register a Deal

Ready to register a deal with Couchbase?

Marriott