Starting from version 2.0, Couchbase server offers a powerful way of creating indexes for JSON documents through the concept of views.

 

Using views, it is possible to define primary indexes, composite indexes and aggregations allowing to:

. query documents on different JSON properties

. create statistics and aggregates

 

Views generate materialized indexes so provide a fast and efficient way for executing pre-defined queries.

However in Couchbase 2.x, indexes are stored to disk and read from disk for each query, which has some performance implications.

In the future Couchbase will allow caching indexes into the managed cache similar to what is done for JSON documents to speed up queries.

 

In the meantime, this blog provides a simple example of how query results can be cached into Couchbase to be retrieved from the cache instead of being served from index on disk.

This is useful for scenarios where a query for an index does not need to be up to date immediately (minutes or more are ok) but is read often (multiple times a second). In this case, the query results will be calculated only every so often based on application needs and read from managed cache the rest of time.

A good use case example for this, is a game leaderboard. A view can be used to create an index for top scores for a particular game and that view can be queried every few mins (say 5 minutes) and cached into Couchbase Server. All requests for the view will go against the cached value and as such will only take ms and do not need any index querying on the server.

 

Note that, the method above is independent from automatic updating of indexes. By default, every index in Couchbase is updated every 5 seconds or 5000 updates, both tunable through the REST API. Learn more about that at: http://www.couchbase.com/docs/couchbase-manual-2.1.0/couchbase-views-operation-autoupdate.html

 

So this means that while the index can be kept up to date, specific queries, which do not need to be up to date, can be cached for higher throughput and lower latency. The only caveat is that maximum length for values in Couchbase is 20MB so cached queries should not be used for super large result sets although it always possible to split results into multiple cached values for larger sets.

 

This is fairly simple to implement, let’s take a look at how can we do this in Java.

 

I will use the bee-sample database, which comes with Couchbase server. If you have not installed it already, go into Settings and select beer-sample then click on Create:

This comes with a brewery_beer view, which I will use to build the caching example:

 

Now let’s take a look at a simple Java application that can be used to execute and cache a query and compare against executing the query every time.

 

The Java code below, first connects to the bee-sample database and:

. executes the query 1 time and reads it from the cache n times or

. executes the query n times

 

In both cases, a timer is started before and after to measure the execution time.

 

The code is very straightforward, uses no parameters for the query but use includeDocs to retrieve all JSON documents associated to the results of the query vs just the document IDs.

 

To learn more about views and queries in Couchbase, read: http://www.couchbase.com/docs/couchbase-devguide-2.1.0/indexing-querying-data.html

The full source code is:

 

// @author Alexis Roos

package com.couchbase.dev.examples;

 

import com.couchbase.client.CouchbaseClient;

import com.couchbase.client.protocol.views.*;

 

import java.net.URI;

import java.util.LinkedList;

import java.util.List;

 

public class CachedQuery {

 

   public static void main(String args[]) {

 

       List uris = new LinkedList();

       uris.add(URI.create(“http://127.0.0.1:8091/pools”));

 

       CouchbaseClient client = null;

       try {

           client = new CouchbaseClient(uris, “beer-sample”, “”);

 

           int requestCount = 100;

 

           double t1 = System.currentTimeMillis();

           View view = client.getView(“beer”, “brewery_beers”);

           Query query = new Query();

           query.setIncludeDocs(true).setLimit(10000);

           query.setStale(Stale.FALSE);

 

           // Doing query a single time and caching it

           ViewResponse result = client.query(view, query);

           client.set(“cachedBrewery_beersQuery”, 0, result.toString());

 

           // Using cache for subsequent requests

           for (int i = 0; i < requestCount – 1; i++) {

               String cachedIndex = (String) client.get(“cachedBrewery_beersQuery”);

           }

           double t2 = System.currentTimeMillis();

           System.out.println(“Test with cache finished in ” + (t2 – t1) / 1000 + ” seconds”);

 

           t1 = System.currentTimeMillis();

           // Querying every single time

           for (int i = 0; i < requestCount; i++) {

               result = client.query(view, query);

           }

           t2 = System.currentTimeMillis();

           System.out.println(“Test without cache finished in ” + (t2 – t1) / 1000 + ” seconds”);

 

           client.shutdown();

 

       } catch (Exception e) {

           System.err.println(“Error connecting to Couchbase: ” + e.getMessage());

           System.exit(0);

       }

   }

}

 

Running the code outputs both test results, which for 100 serial queries yields:

 

Test with cache finished in 3.755 seconds

Test without cache finished in 19.835 seconds

 

Not only test with cache is a lot faster but it also requires fewer resources on the Couchbase server.

The following graph shows the ops per second metric for the beer-sample bucket and the first small bump corresponds to test with cache (essentially mapping to the number of documents for breweries and beers as the query is ran only once), whereas the rest of the larger curve shows that the query has been executed many times and as such resulted in many more operations per second.

 

 

Using caching for querying views is easy and it is simple to set up a program, which will periodically query the view and store the result into Couchbase server where it will be cached. In turn applications can use this cached value for efficiency.

This should be used as appropriate based on application use cases.

Author

Posted by Alexis Roos

Alexis Roos is senior engineering manager at Salesforce. Alexis has over 20 years of software engineering experience with the last five years focused on large-scale data science and engineering, working for SIs in Europe, Sun Microsystems/Oracle, and several startups, including Radius Intelligence, Concurrent, and Couchbase.

4 Comments

  1. Good post and looks like a good workaround.

    When is this planned to be implicit in the product?

    \”In the future Couchbase will allow caching indexes into the managed cache similar to what is done for JSON documents to speed up queries.\”

    Thanks

    1. Hey Alex, sorry for the delay in answering you here. We\’re working on a number of things to speed up views (and other indexes). I would expect to see some pretty significant improvements towards the latter half of next year.

  2. Would going through the results through pagination have any effect on the cached results?

    1. The cached results are really separate from the actual view queries. If you\’re paginating through the view, you\’ll end up caching each of those \”pages\” as separate documents and need to know which one to go get when you want that particular page. It will depend a bit on just how large your result set is…it may be more efficient to store a larger number of results in one document and then use the application to parse/page through it.

      Does that help?

Leave a reply