Clarifications on Couchbase index

Hi! I am very new to Couchbase. I am going thru the CB docs and I come from Mongo and GraphDB background. My questions regarding primary index and secondary index.
Let’s say I have a bucket and 10 different collections. 5 of them are somewhat related to one another and other 5 are somewhat related to one another. I see primary index in CB is optional if we setup proper secondary index based on query patterns. Is that true? Even if I want to create a primary index, based on my collection types (they are independent), will it help in any way yielding results quickly?

  1. Lets say I have collection1-10, and I create a primary index on a collection1, does it store only the keys from collection1 or all the keys from all 10 collections? If it stores only keys from one collection, how will primary index help us with faster queries?
  2. Let’s say I don’t have any primary index, so when I add documents to each collection with key-value pair, does CB create index automatically on Keys (like _id in mongo)? Or Should I be creating that? Can I have Secondary index on values as well?
  3. META().id v/s USE KEYS. It looks like one uses primary index to query but other one uses Keys. So my question is when we use USE KEYS does CB query keys from disk or should I be creating a secondary index on Keys here?
  4. Sorry for a silly question, Is CB in-memory or disk based?

I guess what I am trying to understand is what is the best practice? Should we use primary index or create secondary index on id? or do neither and USE KEYS instead?

In couchbase to use N1QL
You need document keys (i.e. USE KEYS comes into play)
Or must have index on the collection. Index Scan produces keys and used to Fetch the document (same way USE KEYS but it is internal).

  1. Primary Index. Indexes all the document in that collection
  2. Secondary Index, Indexes based on the field of the document. NOTE: If leading index key is MISSING it will not index. Query to use the index it must have predicate on the leading index key.

Your questions:

  1. It only stores collection1, You must have separate index for each collection.
  2. Couchbase never create any index automatically. You must create one.
  3. If you already know keys use USE KEYS. Once it know keys( either USE KEYS or IndexScan) it gets from cache or disk
  4. Both.

Little old but check out (First 2 chapters in Indexing, Optimization sections)

Thank you!
I read the blog and going thru the PDF doc right now.
After reading the blog I am confused a bit reg Primary index.
In the blog they say -
The primary index is simply the index on the document key on the whole bucket. The Couchbase data layer enforces the uniqueness constraint on the document key. The primary index, like every other index in Couchbase, is maintained asynchronously. You set the recency of the data by setting the consistency level for your query.
In your response it sounds like Primary Index is created at the collection level but here they state - whole bucket so which one is correct? And also they say do not use primary index in production. Does that mean we should use USE KEYS in prod?

  1. When you say document key, is it an id of each document?
  2. Is id stored in memory? How come USE KEYS yield result so quickly? Let’s say I have a million documents in a collection and I have a unique document key setup and I know my key before querying the data. So I don’t think I need primary index for this collection, correct? I can just USE KEYS and query the data?
  3. Can I store data in a document without an id ( or primary key ) ? if so, how do I make sure my queries run quickly on any of the document fields? does secondary index come into the play here?

To understand this better, I am trying to come up with a query in travel-sample . Does this look right ?
“Take any route id and get the airline info”
SELECT * FROM `travel-sample`.inventory.route USE KEYS "route_10000" JOIN `travel-sample`.inventory.airline ON route.airlineid = META(airline).id

I mentioned blog/document is old before introduced collections. That is why it talks bucket. Now replace that with collection. In production use secondary indexes (Create the Right Index, Get the Right Performance.).

  1. Document key, id of document , META().id all refer to same
  2. If you know keys you no need index and can be used USE KEYS. or use SDK KV get also instead of N1QL.
  3. In couchbase id (document key) is external to document. If you need to predicate on document field you must use secondary index to run quickly.

Looks like this piece was not answered yet. Yes you can have both primary and secondary indexes on the same collection, including multiple secondary indexes and even identical (except for the name) secondary indexes, called “equivalent indexes.” Equivalent indexes are the way to achieve HA of secondary indexes on Community Edition (CE). On Enterprise Edition (EE) you can instead create indexes with replicas managed by Couchbase (e.g. WITH {“num_replica”:2}). See