An RDBMS with a well-established schema serves well when application functions remain static. While this rigid structure can guarantee stability, it doesn’t lend itself to fast-changing business requirements. Organizations today must consider modernizing their application infrastructure and moving from RDBMS to NoSQL.

This article looks into some benefits of the NoSQL document model, and how data and indexes can be organized to optimize query performance and index configuration. I will also discuss the index strategy that you can consider when migrating your RDBMS data model to the Couchbase NoSQL database, and how the Couchbase 7.1 FLATTEN_KEYS feature can help improve query performance and reduce the number of indexes.

An effective index strategy is one of the most important factors in operating a database. It helps us get the right balance between query performance and resource management. The database should not only efficiently capture data, it should also provide the most optimal access to that data. NoSQL databases are no different from RDBMS when it comes to an effective index strategy.

An RDBMS use case

Let’s consider the hotel object in the Couchbase travel-sample dataset and assume that we use the RDBMS model to capture this information. The relational model would look like this:

Relational model relationships

The hotel object has all the detailed information about the hotel. There is also a hotel_like object that records the names of customers who have clicked on like on the hotel’s social media page. There is also a hotel_review object that records all hotel reviews, including comments and detailed ratings for the different services and amenities.

Business requirement

Create a query to allow users to get a summary view of all the hotels in a particular city that they plan to visit. The summary should include the review ratings of each hotel, and how many people have liked the hotel. To narrow down the list, the query should focus on the recent reviews (2015 for this data set) and the review ratings should be 4 or above (5 is the highest rating).

The relational model query

The index for a relational model query

hotel:  Since there is a filter on city and name, an index should exist. Note that the hotel id is appended to the index because it can help with the JOIN.

hotel_review: There are two filters on the hotel reviews: the review date and the review ratings. There is also a need to eliminate double counting, because there is no explicit relationship between hotel_like and hotel_review, so the review author is added here for that purpose. The hotel_id field is also added to the index because it is the foreign key and can help with the JOIN.

hotel_review: While there are no filters on hotel_like, there is a need to eliminate double counting of likes, because there is no relationship between review and like. The hotel_id field is added to the index because it is the foreign key and can help with the JOIN.

The execution plan for the relational model query

Execution plan for relation model JSON query

Notice the need to perform:

    1. Two JOINs
    2. Three index scans (one for each object)

The Document Model view

Document model view

 

For the document model, both the hotel_like and the hotel_review objects are stored as arrays in the hotel object.  There is no strict rule that you should always include child objects as an array in the parent object, but it does make sense to do so in this case because these objects are always accessed together.

 

The denormalized document model query

The index for the denormalized document model

 

A few points to note:

    1. The SQL++ query references a single hotel object, so there is no need to perform any explicit JOIN between the parent hotel object and the child like or review
    2. The query uses a single index that covers all the query predicates, i.e. the hotel.city, the reviews.ratings, and the reviews.date
    3. When an array is indexed, the index can only be on a single key. FLATTEN_KEYS() allows composite fields from the array, thus allowing the predicates to be on multiple array fields.

The execution plan for the document model query

execution plan for the document model query

Notice the need to perform

    1. Two UNNEST (no JOINs are needed as it is in single document)
    2. Only ONE index scan 

 

Summary

Relational model converted to document model

When a relational model is denormalized into a single object in the document model:

 

    1. The SQL++ query is simpler because there is no need to perform any JOINs
    2. Multiple relational model indexes could be combined into a single document model
    3. The Couchbase 7.1 FLATTEN_KEYS feature combines multiple array element predicates into a single index.

This is just one of the new features we have delivered recently – read more about What’s New in Couchbase Server 7.1.

Author

Posted by Binh Le

Binh Le is a Principal Product Manager for Couchbase Query service. Prior to Couchbase, he worked at Oracle and led the product management team for Sales Clould Analytics and CRM OnDemand. Binh holds a Bachelor's Degree in Computer Science from the University of Brighton, UK.

Leave a reply