JSON Data Modeling

Joseph · June 15, 2021, 5:01pm

Hi,

I was looking at this video JSON Data Modeling for Success and Performance - Connect.ONLINE 2020: The NoSQL Developer Conference (couchbase.com) at around 42:30

Was wondering why did the speaker mention that we never want to have primary index in production?

Can we avoid having primary index created for the bucket/scope/collection?

Thanks

jon.strabala · June 16, 2021, 4:25pm

Hi @Joseph,

Basically a primary index allows you to only find KEYS any sort of N1QL DML statement would then always need to fetch the document to do filtering this is much less efficient than a purpose built index that satisfies a query without needed to dip into KV.

For a much deeper dive into the reasons please read the excellent blog post by @keshav_m What is the Couchbase Primary Index? Learn Primary Uses

Best

Jon Strabala
Principal Product Manager - Server‌

ianmccloy · June 16, 2021, 6:48pm

Hello Joseph

Primary indexes essentially have to scan across all documents in a bucket, you almost never would want to do this for a production OLTP Database workload as it would use an excessive amount of resources and wouldn’t provide ideal latency. The size of a primary index grows with each document in the bucket, which might be into the billions with a production use-case. If your production cluster uses defined indexes on specific fields, these are normally much smaller than indexing every document in a bucket and therefore are less resource intensive. When you write a N1QL query for specific fields and the cluster has to use the primary index, it forces the Query Service to make additional requests to the Data Service nodes and transfer the documents from the buckets to satisfy the request. These additional hops and data transfers can be slow depending on your infrastructure and use-case.

Primary indexes are optional. When a query can be served with just indexed fields, this is called a covering index. Covering indexes are serviced very quickly as they do not need to go back to the Data Service to collect additional information before a query is serviced. And when using memory optimised indexes, this is even done straight from memory.

We have a great blog post that describes when you’d want to use a primary index.

Take a look and let us know if you have any additional questions.

Thanks,
Ian McCloy (Couchbase Principal Product Manager)

Joseph · June 17, 2021, 4:40am

Hi @jon.strabala and @ianmccloy,

Thank you for responding to me. I’ll dive in the article that you guys shared, and come back again if I have further questions on it.

Topic		Replies	Views
Questions regarding primary index SQL++	1	1118	October 19, 2020
Clarifications on Couchbase index SQL++ query , n1ql	4	1761	March 2, 2022
Primary index is it keys only or also values? Couchbase Server n1ql , index	11	3680	February 17, 2020
Which index statement should I execute? SQL++	8	623	March 20, 2024
Couchbase bucket and indexes Couchbase Server n1ql	3	971	January 18, 2023

JSON Data Modeling

Related topics