As application developers, your main focus is to design the most efficient queries  for your application. You want to ensure the integrity of the data that your application pushes to the database, and construct the most efficient queries to retrieve the data needed for the application. In many cases, the application performance is deferred to a later stage of the development cycle. Often it is left to the database performance specialists who can provide the recommendations for the index design, which may also include query rewrite, in order to meet the performance requirement. But understanding the query’s performance can greatly improve productivity and help developers to seek a more performant solution early in the development cycle.

Database vendors understand this challenge, and enterprise class RDBMSs include Performance Tuning capability as part of their database toolsets.

Couchbase N1QL Index Advisor, a new feature in Couchbase 6.5 (DP), now provides recommendations for GSI indexes for N1QL query statements. Its primary objective is to recommend index to optimize the query response time. It allows developers to become more productive as they can create the recommended index(es) and see how the query performs early in the development cycle. It also allows DBAs to improve system performance by periodically reviewing slow running queries.

You can learn how to use Couchbase Index Advisor by reading the Understanding Index Advisor and Start Using Index Advisor sections below. The remaining sections provide more details on how Index Advisor arrives at the recommendations, including tips on using Index Advisor on completed requests and how to identify infrequently used indexes.

Understanding Index Advisor

Database performance, while it is a difficult task to manage well, is a well understood problem for database vendors. The advent of SaaS and DBasS also means that the database performance challenge is just part of the service, and needs to be automated with regard to its ability to diagnose and fix the problem. Our development team has been hard at work to create the building blocks towards that vision.

Couchbase Index Advisor is part of the N1QL ADVISE  construct that allows developers and DBAs to manage the performance tuning aspect of  their applications. To understand how Index Advisor works, you need to understand the main data retrieval techniques in the Couchbase data platform:

  • Key-Value – A  fast and direct data access for the Data service when the keys are available. This access method does not involve the Couchbase Indexing service.
  • Indexing Service (GSI)  – A scalable service that enables applications to perform fast N1QL queries. GSI leverages a distributed architecture and stores  the indexes on an independent set of nodes across the Couchbase Cluster.
  • Search (FTS) – A Full Text Search service that provides keyword and fuzzy search. FTS, as with all Couchbase services, is a distributed service that can be setup across the Couchbase cluster.

N1QL Query leverages all of the above services to deliver the query result to the applications. However, Index Advisor in v6.5 (DP) only recommends indexes that are managed by the Index service (GSI). Index Advisor does not assess the Data service retrieval technique via USE KEYS, nor the Search(FTS) service via the N1QL SEARCH_QUERY() and SEARCH() function.

Start Using Index Advisor

As I had mentioned earlier, Index Advisor is the first in a series of feature that we introduce for database performance tuning. Index Advisor helps to systematically identify indexes that can optimize the application performance, and ensure that indexes are created using the recommended best practice for key order. In effect, Index Advisor can offer recommendations even when you already have working indexes for your queries.

You can start using N1QL Index Advisor using:

N1QL – ADVISE  [INDEX] <query>

Current Indexes – The section lists all the existing indexes that the query planner will use for this query. It also provides additional information on any index(es) that the query planner uses as covering index.

Recommended Indexes – This section lists all the indexes that are recommended by the Index Advisor. The covering index recommendations are those that would allow the query to work without additional fetching from the data service. Whereas the indexes section shows only those that help with the predicate clause. Please note that there could be duplication of information in these two sections, but where the index name would be identical where the index keys are the same.

It also provides the reason why it had recommended the indexes.

Use  Advise from the Query Workbench

Click on Advise button to obtain the Index Advisor output for the query.

N1QL ADVISE supports query for SELECT, DELETE, UPDATE and MERGE. It provides index recommendations for:

  • Predicate WHERE clause.
  • Join condition ON clause for Index join, ANSI join, Index NEST, ANSI NEST and ANSI MERGE.
  • Array index for UNNEST, and array predicates in WHERE/ON clause.
  • Sub-Query when used in FROM clause.

How does Index Advisor work

I will use the following approach to explain how Index Advisor work.

  1. I will select an example that has the typical query constructs (predicate, join, array, group by), that can be difficult to guess what the best indexes are for the query.
  2. Examine the query plan to understand what the optimizer would do for the current query.
  3. Manually determine what indexes to create to improve the query performance. This would be based on a typical application developers understanding of predicate and join field indexes.
  4. Then run Couchbase Index Advisor  and compare its recommendations to those we had in mind.
  5. Create the recommended indexes.
  6. Review the new query plan and assess the changes.
The example query

This query retrieves the number of routes by airlines from any airport in the city of San Jose, USA that have schedule flights on Sunday.

The query plan 

  1. The query used five indexes
    1. Three indexes were used for the predicates on document type ‘airport’ – ‘city’, ‘type’, and ‘faa’.
    2. One index was used for the document type ‘route’ on the field ‘sourceairport’ for the purpose of the join between ‘route’ and ‘airport’. In this case the existing index ‘def_route_src_dst_day’ was used because it has the leading key ‘sourceairport’.
    3. One index for document type ‘airline’, which is the default partial index for ‘type’.
  2. Two document NestedLoop joins –  ‘airport’ — ‘route’ and ‘route’ — ‘airline’.  But, other than the index ‘def_route_src_dst_day’ on ‘route’, there are no other indexes to support the rest of the joins. So these joins are not efficient.
  3. There was also a final aggregation to count the number of route by airline.
  4. The query completed in 2.1s
How can we improve the query’s performance

  1. The ‘airport’ document has two predicates (‘city’ and ‘country’) and a flavor predicate on ‘type’. In addition, the airport ‘faa’ column is also used for the join to ‘route’. So instead of using three separate indexes, there should be a partial index on the  ‘airport’ document for (‘city’,’country’) [action_1]. The ‘airport.faa’ field could also benefit with a separate index, or part of the main airport index.
  2. The ‘route’ document has no predicates, but it has a join to ‘airport’ with the field ‘sourceairport’. An index on this column should help with the join, and we already have the index ‘def_route_src_dst_day’ [action_2] so no action is needed for this join.
  3. The ‘route’ document also has a join to ‘airline’ on the field ‘airline’. So we should have an index for this – ‘route’ document (‘airline’) [action_3].
  4. The ‘airline’ document also doesn’t have any predicate, but it has a join to ‘route’ with the field ‘iata’. So another index is needed on the ‘airline’ document (‘iata’) [action_4].
  5. We have a predicate on the route.schedule array. So we need an index on this too. ‘route’ document (DISTINCT ARRAY x.day FOR x in schedule END) [action_5].

So we have come up with five action points, which result in four additional indexes for this query.  Let’s see what Index Advisor recommends.

Run Index Advisor to check the result

Current Indexes section

This is the same as what the plan had given us. The query uses five indexes.

Recommended Indexes section – “Indexes”

This section lists out all the indexes from the predicate perspective. Note that JOIN has ON fields, which are treated as predicate too. So let’s review the recommended indexes.

  1. adv_country_faa_city_type ON travel-sample(country,faa,city) WHERE type = ‘airport’ – Other than the key order, this is the same index as we have identified in [action_1]. The key order is not significant here as these predicates all use equality.
  2. adv_airline_sourceairport_DISTINCT_schedule_day_type ON travel-sample(airline,sourceairport,DISTINCT ARRAY x.day FOR x in schedule END) WHERE type = ‘route’ – What Index Advisor has done here is that it has combined [action_2, action_3, action_5] into a single index.
  3. adv_iata_type ON travel-sample(iata) WHERE type = ‘airline’ – This is the same index that we had identified in [action_4]

Recommended Indexes section –  “Covering Indexes”

This section lists out all the covering indexes recommendations. Note that covering indexes are those that are used to satisfy a query on a document where the execution would not need to perform additional fetch from the data service.

  1. adv_faa_city_country_type_airportname_airportname ON travel-sample(country,faa,city,airportname) WHERE type = ‘airport’  –  Index Advisor has added the route.airportname field to the recommended index [adv_country_faa_city_type ON travel-sample(country,faa,city)]. By doing this, the query service would not need to fetch the route document from the data service to project the route.airportname
  2. adv_iata_type_name_name ON travel-sample(iata,name) WHERE type = ‘airline’ – Index Advisor has added the airline.name to the recommended index [adv_iata_type ON travel-sample(iata) WHERE type = ‘airline’]. By doing this, the query service would not need to fetch the airline document from the data service to project the airline.name

Note:  Covering index reduces the need for the query service to perform additional fetching from the data service, thus improves query performance. But user needs to make a decision on whether to create covering indexes or not based on the specific constraints, such as memory or storage of the user’s environment.

Create all recommended indexes and review the new query plan

For this exercise, I have decided to create all the recommended indexes using covering indexes where applicable.

The query plan after adding the recommended indexes
  1. The query now uses the three new indexes recommended by Index Advisor, instead of five indexes.
  2. Since we created covering indexes, the query service no longer needs to perform any additional fetching from the data service.
  3. The query now completed in 120ms. A 94% improvement.

Use Index Advisor to analyze the workload

N1QL – ADVISOR( <query>)

Index Advisor can also be invoked with the ADVISOR() function. It allows users to:
  1. Pass in multiple queries. SELECT ADVISOR(“SELECT ..”,”SELECT ..”,”UPDATE..”);
  2. Pass a sub-query that returns the actual N1QL statements. Because the query returns a json document structure, you will need to use the RAW keyword to ensure that only the actual N1QL statements are returned.
Use ADVISOR() to  identify the slow queries by querying the completed requests.

Identify existing inefficient indexes

Even when you already have indexes for your query, running Index Advisor  can help identifying cases where existing indexes may not be the best indexes, based in the Couchbase Index guidance.

Consider this example:

And that your query may run without any issue with an index below:

But the recommended index for the query would be:

This is because the index should perform better if the keys follow the order of predicate type rule:

  • Rule 4. Less than/between/greater than.
  • Rule 6. Not null/not missing/valued.
  • Rule 8. Flavor for partial index.

Recommended Index Naming Convention

Index Advisor recommends index with a specific naming convention,

  • adv_field1_field2_field3…

For array indexes, an extra underscore is added at the end of the array index to distinguish:

  • adv_[DISTINCT/ALL]_field1_field1.2_field1.2.3__field2_field3…

It is recommended that you retain the index naming as provided, as this would allow Index Advisor to recognize the indexes that it had previously recommended versus those that have been created by other means. In future updates, Index Advisor may opt to recommend replacing an index, but it would only do so if the index it seeks to replace was one that it had previously recommended.

Use Index Stats Last Known Scan Time

Your database may have many indexes for different query requirements over a period of time. But as indexes are accumulated in your database, the potential for duplicates or similarly created indexes do happen. This includes indexes that are no longer in use. You can query the value “last_known_scan_time” from the Index stats rest endpoint, to determine the statuses of your indexes, then decide if you should drop the index.

Results:

Summary

From an application deployment perspective, the need to manage indexes does not stop at the end of the development process, but an ongoing exercise.  In fact, index management should have a life cycle of its own. This is because the volume and shape of the data can change over time, which subsequently may require changes to the existing indexes. In some cases,  the query construct may also need revised. Index advisor, therefore,  is a critical part for the product lifecycle management.

A few key points to note about Index Advisor as of Couchbase 6.5 (DP)

  • Type of Index – Index Advisor only recommends GSI indexes. It does not recommend primary index, nor FTS index.
  • Rule based – In the current implementation, Index Advisor recommends indexes based predicates, projection list and the key column order. In effect, it is based on the same rule that is used by the query planner. As with rule based database optimization(RBO), Index Advisor does not take into account the statistics of the data distribution of the key columns. For this reason, user should perform a performance evaluation with the recommended indexes before deploying them in a production environment.
  • Sub-Query – Index Advisor works with sub-query when the query is in the FROM clause only.
  • Multi key array index –  Index Advisor will only recommend a single key for an array index, even when the predicate references multiple array fields.
  • Partition Index – Index Advisor does not include the partitioning clause in the recommended index creation statements.
  • Existing indexes – Index Advisor does not recommend the redesign of existing indexes, nor recommend dropping of existing indexes. User should use the index last scan time to make the decision.

Resources

We would love to hear from you on how you liked the 6.5 features and how it’ll benefit your business going forward. Please share your feedback via the comments or in the forum.

 

Author

Posted by Binh Le

Binh Le is a Principal Product Manager for Couchbase Query service. Prior to Couchbase, he worked at Oracle and led the product management team for Sales Clould Analytics and CRM OnDemand. Binh holds a Bachelor's Degree in Computer Science from the University of Brighton, UK.

One Comment

Leave a reply