Count of distinct is slow in couchbase

Hi Team,

We have one count of distinct query which is taking more time. Below is the query and index. Can anyone please let us know the best way to reduce the response time.

select count(distinct id) as totalCount from testA where data_type=‘XXXX’ and number= 0000 and status in [‘XXXX’, ‘XXXX’] and url is not null and type=‘XXXX’

Existing index:
CREATE INDEX idx1 ON testA(number,status,type,id) WHERE ((data_type = ‘XXXX’) and (url is not null))

Have you tried to ADVISE the select?

(Or run it through the online index advisor: ?)

It may suggest an index that’ll work better. Does the query plan show use of this existing index?

Yes tried the advise and it says existing index is sufficient

OK, when I plugged it in to the online advisor it suggested:

CREATE INDEX adv_number_type_data_type_id_url_status ON `testA`(`number`,`type`,`data_type`,`id`,`url`,`status`)

as a covering index. Could you try this one and see if it performs any better? – I’m not sure why your existing index wouldn’t be the best possible performing index so this is just confirmation really.

I assume you’ve checked your query plan and it is simply the index scan?

I have created the above covering index as well. But there is no much improvement. Query is still taking 2secs for its execution.
Yes in the query plan, i can see only index scan

DISTINCT aggregate takes time it needs avoid duplicates.

If using EE and status in [‘XXXX’, ‘XXXX’] is same across queries try this.

CREATE INDEX idx1 ON testA ( number , type , id ) WHERE (( data_type = ‘XXXX’) and ( url is not null) AND status IN [‘XXXX’, ‘XXXX’] );

Make sure it uses covering and index aggregation described Index Grouping and Aggregation Overview - The Couchbase Blog

We have tried this index as well. But no luck. Can we implement it using view or group by or any other way?

Can anyone provide map and reduce view function for above n1ql query

Hi All,
Can anyone look into this?

The query service doesn’t pre-compute results to return in a select (map/reduce or pre-computed views). The result set is computed each time you run a statement.
How many documents do you have indexed and how many distinct values are you expecting?
Presumably the time taken is all in the index scan according to the profile output? – Perhaps you could share the complete profile for your query?