The Couchbase database platform supports two storage mechanisms: Couchstore, the default, and Magma, the recently released engine. Both offer benefits under various scenarios. This blog post gives a brief overview of the new Magma storage engine, provides a comparison of each engine, and summarizes the results of the performance benchmarks.
Couchstore is a mature storage engine that is optimized for high performance with large datasets, particularly ones that can fit in memory. The minimum bucket size for Couchstore is 100MB. It’s ideal for caching use cases and situations where data compression is not a primary deciding factor.
Magma is a new storage engine that is designed to be highly performant even with very large datasets that do not fit in memory. It is transcendent for use cases where disk access is paramount. Magma is optimized to run on very low amounts of memory even with very large datasets. Magma really shines when used for datasets that will not fit into available memory and that require maximum data compression.
Below is a comparison table that summarizes each storage mechanism.
Couchstore and Magma Comparison
|Minimum bucket memory quota||100MB||1GB|
|Minimum memory to data ratio||10%||1%|
|Maximum data per node||3TB||10TB|
|Data size optimization||Best when the working dataset can fit into memory||Best when the working set is much larger than the available memory and you need disk access speed only|
|Store and access||Access data up to ~ 1TB||Store and access several terabytes of data|
|Hardware||Can run on low-end hardware||Quality hardware is preferred|
|Supported services||All services including full-text search, eventing, and analytics are available||All services including full-text search, eventing, and analytics are available with the 7.1.2 GA release|
|Data persistence||Most data is accessed from the memory cache||Applications need large amounts of persistent, durable data|
|Use cases||Use case primarily requires memory access||Use case primarily requires disk access|
Magma is the next-generation document storage engine of Couchbase Server. It was designed with the goals of improving both data density and write performance on each cluster node. It achieves these goals by separating index and document data to minimize write amplification (WA). Write amplification relates to writing data to file storage where the write is multiplied due to factors like immutable data. Magma also includes an incremental compaction method to maintain space and allow for high data density and lower memory requirements. Lowering write amplification increases write throughput and also extends the life expectancy of SSDs by reducing the number of write-erase cycles.
Other design goals for the Magma project included:
Scalable concurrent compactions – Full database compaction is an expensive operation. Managing a high-density database requires small, concurrent, incremental compactions to reclaim space.
Solid state drive (SSD) optimization – Random I/O was minimized to only occur during point lookup operations. While sequential read and write I/O access patterns are leveraged to take advantage of the full bandwidth of the SSDs.
Low memory footprint – High-density data decreases the possibility for read and write caching. So Magma is optimized to utilize a small memory footprint.
Garbage collection – Magma includes a method of estimating the fragmentation in the log-structured object store and accurately calculating the disk fragmentation. This is used to calculate the garbage size per log segment which triggers compactions when the fragmentation reaches a threshold of 50%.
One key aspect of the Magma architecture is the log-structured object store which stores documents on an append-only segmented log. The log-structured store maintains an index that allows the querying of a document by seqno (sequence number). The object store contains log segment files that are arranged sequentially using a growing log with a tail for accepting incoming writes (see the Magma object store architecture diagram below). The background thread appends the document mutations to the tail log with each document receiving a unique segno.
Although the log can contain multiple immutable document versions with the same key, the older document versions become stale when new versions are appended. The read operations always read the latest version, so when a key lookup is performed the latest version of the document is returned. Eventually, the stale records are removed from storage and a separate garbage collection process is used to reclaim space.
Magma was performance tested against RocksDB and Couchstore. The evaluation focused on throughput and write and space amplification for various Yahoo! Cloud Serving Benchmark (YCSB) workloads with data that was too large to fit in memory. During two rounds of testing the conclusions were:
- Magma is 1.77x faster and has 3.38x less write amplification than RocksDB
- Magma is 36x faster and has 5x less write amplification than Couchstore
- Magma is 1.25x faster and has 2.36x less write amplification than RocksDB
- Magma is 21x faster and has 3.37x less write amplification than Couchstore
Through the efficiency improvements in Magma, the single machine data density supported by Couchbase Server was increased by 3.3x and the memory requirement by 10x, which reduced the total cost of ownership (TCO) up to 10x. The performance evaluation results showed that Magma outperformed both Couchstore and RocksDB engines in write-heavy YCSB workloads with datasets that were too large for memory.
To learn more about the next-gen document storage engine, check out the following paper: Magma: A High Data Density Storage Engine Used in Couchbase. Thanks for taking the time to learn why Magma IS the next-generation document storage engine!