Index(Distinct field) Not Behaving as Expected

I have a document type like this

{
widgetId:…,
a:…,
b:…,

z:…,
type: ‘123’
}

We have millions of them.

I am looking to query like this:

Select distinct widgetId from default where type = ‘123’

I create a partial index like so:

CREATE INDEX myIndex ON app((distinct (widgetId))) WHERE (type = “123”)

The query does not hit the above index even with an explicit hint.

Also the index stats (items, and data size) indicate that all the millions of documents are represented in the index, where there should only be about 10k (distinct widgetIds).

  1. What am I missing about how indexes work?
  2. How can I write an index that will allow me to get the data efficiently?

Please help.

Index can’t create DISTINCT values across the documents for scalar values. DISTINCT keyword for ARRAY indexing. i.e widgetID must be ARRAY and it will flatten ARRAY and index DISTINCT values per document as separate entry in index so that you can query with ANY clause.

Right index will be:

CREATE INDEX `myIndex` ON `app` (`widgetId`) 
WHERE ( `type` = “123”);

If you have EE version: The following query will use Index aggregation https://blog.couchbase.com/understanding-index-grouping-aggregation-couchbase-n1ql-query/

SELECT   widgetId 
FROM  default
WHERE  type = "123" AND widgetId IS NOT MISSING
GROUP BY widgetId;

Ok thank you, indeed, I got the idea from usage with Arrays. It would be a great feature to have that generally. Would enable efficient distinct queries with tiny indexes.

Indexer needs to store one entry per document. One you are looking more falls in pre-computed group/aggregation. If CE may be you can explore this use case as map/reduce views.

1 Like

We have EE and of course your solution works well. Thank you.

Thank you for the insight about how the indexer works. I was laboring under a misconception.

1 Like