Best index for GROUP BY query?

landonbar · December 9, 2015, 1:09am

I have a documents in the following format:

{
“accountNumber”: “123” ,
“lastName”: “Smith” ,
…
}

I want to query the list of account numbers that have different lastName values for the same accountNumber using this query:

SELECT accountNumber FROM bucket GROUP BY accountNumber HAVING COUNT(DISTINCT lastName) > 1;

Without indexes I get the results back in ~30 seconds, but would like to index this query to run much faster.

Is there a good indexing strategy for GROUP BY clauses or are they primarily useful for WHERE clauses?
Is there an indexing strategy that would work best for this specific case or should I try a different approach?

I’m using the 4.1 dev preview for testing currently. Thanks.

geraldss · December 9, 2015, 6:49pm

Indexing would help you if you had a WHERE clause.

One thing to try is:

CREATE INDEX idx ON mybucket(accountNumber, lastName);

EXPLAIN SELECT accountNumber
FROM mybucket
WHERE accountNumber IS NOT NULL
GROUP BY accountNumber
HAVING …;

Make sure you see cover() in the EXPLAIN output.

Topic		Replies	Views
Performance issues with GROUP BY on indexed field SQL++	5	1387	July 23, 2018
Group Query is taking way to long SQL++	1	582	February 17, 2020
Anyway to make this simple count + group query faster? SQL++	2	885	July 12, 2018
Index for count and group by Couchbase Server n1ql , index	1	1497	April 29, 2017
Best practice on n1ql index? SQL++	3	2176	January 6, 2016