Multiple different indexes per tenant - How?

synesty · November 28, 2016, 9:26pm

Hi,

In our Cloud/SaaS/Multi-Tenant solution each customer can create multiple datastores.

A datastore is something like a table where you can put records with rows and columns (like a CSV or a Google Spreadsheet).
each record has a unique identifier within the datastore
each row is stored as a JSON-payload (thus we think Couchbase is a good fit)
the customer can query each datastore to build custom reports etc. (thus we think N1QL is pretty awesome)

What we want to do

the customer should be able to choose the fields which should be indexed
the reason is that each datastore has a different purpose so different “questions” will be asked, hence it makes sense to index different fields

Question

is it possible (or planned for the future) to add indexes on the fly which are bound only to a sub-set of the data? e.g. based on a datastore-id / tenant-id?
our goal is that each index is only populated / touched for the specific datastore/tenant it belongs to
as we understand it, if we use a global GS1 index for this, then this index applies for the whole bucket. But we need it indexes which apply only for a certain portion of documents (in other words: an insert/update of a doc of datastore-A should not affect the indexes of datastore-B)

Background

We are currently evaluating Couchbase because we are searching exactly a functionality like this. We also thought about maybe having an ElasticSearch-index per datastore/tenant but it would be cool if this would be possible in Couchbase directly, as there are other cool features.

keshav_m · November 28, 2016, 10:01pm

Hi @synesty, Thanks for the kind words on Couchbase & N1QL.

on the questions:

Yes. You can create index on subset of data based on expression.
CREATE INDEX i1 ON info(x,y,z) WHERE customer = “joe” and tenant = “2983” ;
Your assumption is correct.
They’re called “global” in the sense that they can index data from ANY of the data nodes. But the index themselves will contain the subset of the data specified by the WHERE condition of the CREATE INDEX.

Couchbase indexing is quite flexible. Please take a look at this article:
https://dzone.com/articles/index-first-and-query-faster
and the recent talks at Couchbase Connect at http://www.couchbase.com

synesty · November 28, 2016, 10:20pm

Hi @keshav_m,
thanks, this is great news. “Partial index” is the magic term I was looking for.

Now we will investigate how this will scale, as the number of indexes can become quite large - basically unbound.

prasad · November 28, 2016, 10:30pm

sure… you can use partial indexes, which is basically an index created on subset of documents in a bucket. See CREATE INDEX syntax… You can provide filters with where-clause.

geraldss · November 28, 2016, 10:48pm

You will scale the number of indexes by scaling the number of index nodes. You can designate nodes as index nodes. We call it multidimensional scaling. You can scale data, index, and query nodes independently.

synesty · October 13, 2017, 10:38pm

Do you see any problem in the approach of CREATING INDEX on the fly when an index will be needed (for a analytics-like query “Top 10 best-selling products”) and DROP the index afterwards? The idea is to keep the number of indexes to a minimum (to not waste precious index memory). The indexes are only needed short-term for some queries which e.g. run hourly or nightly.

The use case:
Assume a Dashboard with several graphs (e.g. a pie chart with TOP-N products, TOP-N cities etc.). Each graph could be the result of a N1QL query. The number of different graphs/queries can be very high, thus the amount of needed indexes is high too. But the indexes are not needed all the time. We can fetch and persist the query-results and then the indexes are not needed anymore e.g. until the night, when the Dashboards are created.
The amount documents is always between 100k and a few 1mio items. So N1QL queries are only fast if they hit a covering index. We thought about MapReduce-Views too, but like in this topic Top N over aggregated (MapReduced) we cannot sort by the value (_count or _sum).

It seems that this “Getting the TOP-N documents grouped by some attribute” is a pretty hard use case for CB if you are dealing with a large number of documents and a large number of different JSON-structures (which requires lots of different indexes).

Any ideas would be appreciated.

CB 5.0.0 beta2

keshav_m · October 15, 2017, 5:39am

Creating the index dynamically is fine. You’d need to size the nodes to handle the additional indexes in terms of memory, storage and cpu. During the index creation, projector will also have additional work. Presumably, you’re creating partial index on documents for a given time/other range.

/cc @deepkaran.salooja @john

synesty · October 16, 2017, 8:04pm

Ok thanks. I think we have to try it out and gather some metrics.

yucel.okcu · January 25, 2018, 2:20pm

Hey @synesty,

We have a case where the requirements are similar with your use case. Have you had any chance to measure the performance impact?

synesty · January 25, 2018, 10:59pm

No unfortunatelly not. We are in the still in the process of introducing CB (5.0) and we currently go with a fixed set of GSI indexes, which is the same for all customers. Not sure yet what our strategy for customer-specific indexes will be. Maybe we solve that with a different technology (e.g. ElasticSearch or Couchbase’s Fulltext Search). But that is in the future, once we know that we will continue building on CB. In our scenario in step 1 we are replacing MySQL with CB for one really specific part (not CB for the whole application). If that works well we will check out how to do more things.

Topic		Replies	Views
Best practice on n1ql index? SQL++	3	2169	January 6, 2016
Get specific used storage of customer on couchbase disk (multi tenancy) SQL++	2	979	October 11, 2018
Priority in N1QL indexes SQL++	7	3746	February 17, 2015
Couchbase no of index limitation Couchbase Server	6	136	February 7, 2025
Differences between primary and secondary Indexes? Couchbase Server	16	15430	July 23, 2015

Multiple different indexes per tenant - How?

Related topics