In this article, we will touch on some practical examples starting with N1QL queries using GSI Indexes and simulate a dataset that over time grows not just with document size, but the actual document structure growing as well. Couchbase Server makes flexible schema easy, but when new fields and data are added we need to add more GSI indexes and potentially maintain them.

I’d recommend reading through our Flex Indexes docs after this article if you have not, however; this blog post will cover some practical examples to get you familiar with using Flex Indexes.

Starting in Couchbase 6.5 (early preview) and now with more support in the Couchbase 6.6 release, we have the ability to leverage Flex Indexes. It’s a great tool to use in cases where conditions are not predetermined and your search predicates involve a large number of fields. We will target those situations. We use the travel-sample bucket in order to ensure you can get up and running and follow along easily.

Preparing Our Server & Data

To get started I have installed Couchbase Server 6.6 using Docker, but feel free to install locally, on Couchbase Cloud, or on your favorite cloud platform if you wish.

Once our Couchbase Server is up and running, install the travel-sample sample bucket. You should now see 31,631 documents available.

travel-sample data oveview

Installed also are the default primary and GSI indexes to help you query this dataset right out of the box.

travel-sample indexes

For the queries that we will be running against the hotel data, we will only be using three of the indexes we see installed on the Indexes tab.

    • def_city
    • def_primary
    • def_type

This should leave us with three indexes and we could assume that if our only intention was to query hotel records, these would suffice for our queries around the existing data from the travel-sample data set. Considering our “hotel” type documents, these three indexes will be the ones we use in our examples.

Basic GSI Indexes

We will start by looking at an N1QL query WITHOUT Flex Indexing.

When we run this N1QL query using the prebuilt GSI indexes that come with the travel-sample dataset we end up using an intersect of the def_type and  def_city GSI indexes.

In the Couchbase Server Web UI, Execute the Query in the Query Workbench editor and review the Plan to see which Index is being used.

This index was picked because of rule-based optimization, if there is an index that matches the predicate, it will be used. Since “California” appears as the state for all documents with a city of “Fremont” or “Oakland”, we can use an intersect of the def_type and the def_city indexes to get the most optimal performance.

Add More Data & Fields

Now, let’s add some more data. We will be adding 30,000 documents with the following new fields:

  • stars: Number
  • facing: String
  • hotelType: String
  • majorAmmenities: Array of Strings

If you would like to follow along, clone the travel-sample-fake-hotels repo and before running the server.js file, ensure that your Couchbase username and password in the connection code are correct.

Once finished, this will bring our overall document count to 61,631 with 30,000 of those being newly added documents with our new fields that we would like to start using in our query predicates.

When we add a query against our new data using a predicate with our new fields, our query is not going to perform optimally. We would need to add a new index. Something like:

In a production environment, adding one index can sometimes be tedious and require a DBA as well, our indexes need to be maintained over time.

Add Index for Hotel Type

If we wanted a better index for queries with predicates involving hotelType we could add the following:

At this point, running a query like the following:

Will use this index instead of the Primary or def_type index, with good reason. We will see that as we add new predicates to our queries to include the new fields we added like starsfacinghotelType, and possibly even predicates based on checking our majorAmmenities array we will need to carefully add new indexes based on query strategy. Couchbase is pretty good at dealing with this, there are great resources like our blog (Create the Right Index, Get the Right Performance) and there are other useful resources to help.

But not creating the right index can cause many issues, one obvious being elapsed time to fetch your data. 

The following query utilizes the new fields we have added:

In the situation above, if we had not created the def_hotelType index, we could see query times of more than 5 seconds vs 500ms.

Overhead for an index like this is about ten seconds to build and it will deliver tenfold results considering overall time and performance when retrieving your data. Remember, there are 30k of documents to sift through in the case where we use the wrong index.

Add More Indexes to Keep Up

Depending on how much data we have and many other characteristics of our data, adding another index for predicates looking to query on the field “facing”, which indicates what side of the property the entrance to a hotel is facing.

Better yet, we could use the ADVISE feature to add a new index, this latter suggestion is recommended over adding an individual index. If we use the ADVISE feature we get the recommendation of the following index to be added:

We could get better performance from the adv_facing_hotelType_type index that was suggested, but when you consider other permutations of a query similar to this one, but in different order of operations, we get different results. These are all things to be aware of when querying and knowing what indexes to use.

What if we could query our data of type hotel, but with an index that could allow us to be more flexible with our queries? It could also allow for other developer experience considering it could be used for things like ad hoc queries and potentially be performant enough to use instead of our GSI indexes. What if there were an option of a single index.

Let’s Flex on This Data

In Couchbase Server 6.6, we have a feature in Flex indexing that may meet your needs. The cool thing is that I can show you very easily how to get started using it.

We need to tell the N1QL query engine that we are intending to use a Flex Index. As well the query should have certain characteristics to use the Flex Indexing, considering those requirements are met, the N1QL query is transformed into an FTS query and run against the full-text index.

As to not repeat the information in our documentation, it is still important for me to mention that there are semantic differences and restrictions in N1QL queries vs Flex Indexes. For this reason, we will try to avoid those restrictions and be cognizant of the semantic differences.

Setting up Our FTS Flex Index

In order to get a Flex Index to run our query instead of the indexes we already have set up for our travel-sample dataset, we just need to go to the search tab in the Couchbase Server Web UI and click “Add Index” and fill in the following fields.

  1. For the name field, I’ve used “fts-index”, a more descriptive name is recommended
  2. Select the bucket that our documents are contained within
  3. For the JSON type field, we will specify: type
  4. Click +Add Type Mapping and specify our document type: hotel
  5. Select “keyword” in the default analyzer dropdown
  6. Uncheck the default type mapping so that we don’t scan all document types in our bucket

Use FTS Hints

We still need to indicate that we would like to try and use Flex Indexing in our N1QL queries. In the situation above where we have a very basic N1QL query retrieving documents of type “hotel” if we wanted to alter that query to take advantage of Flex Indexing we would provide the (USING FTS) hint:

This query now uses the Flex Index that we set up and delegates Flex Indexing to handle this because of the generic nature of our index.

Our Query Options Are Many

Let’s try a few more queries, I have run them all and each seems to have similar benefits over the GSI indexes considering overall time to run the query and retrieve our data.

Here we have added a city predicate.

Next, we combine the city and stars field in our predicate.

What if we leave off type=”hotel”? (16s primary scan)

We want to ensure we add WHERE type=”hotel” to our predicate, when using the specific Flex Index that we set up, remember that in a sense, it’s the most important part of our query predicate and our Flex Index will not get picked up otherwise. An index that is type mapped will not be selected by N1QL if the condition expression (e.g type = “hotel”) is missing in the query.

Let’s use the majorAmmenities field and check if it has a string of “Restaurant” contained.

Wrap Up

We have walked through the basics of GSI Indexes and Flex Indexing, shown you how to get started with Indexing in Couchbase. None of this should be a substitute for reading the documentation but may help you get started easier. With this knowledge, you should now understand how to approach Flex Indexing.

Thanks for reading and please, check out the Flex Indexing documentation for more info!

Author

Posted by Eric Bishard

International speaker, blogging and advocating for the JavaScript, React, GraphQL and NoSQL community working as a Senior Developer Advocate for Couchbase.

Leave a reply