I have created a partitioned index as below. the actual index is different, I am giving a similar example with travel-sample
CREATE INDEX `idx_grp_add` ON `travel-sample`((`geo`.`alt`),(`geo`.`lat`),(`geo`.`lon`),`id`)
PARTITION BY HASH((`geo`.`alt`),(`geo`.`lat`),(`geo`.`lon`),`id`)
Then, I ran the below query
SELECT geo.alt, geo.lat, SUM(geo.lon), AVG(id), COUNT(DISTINCT geo.lon)
WHERE geo.alt BETWEEN 10 AND 30
GROUP BY geo.alt, geo.lat
HAVING SUM(geo.lon) > 1000;
I was expecting that the GROUP BY operation will be pushed to the indexer. But I see it has not. if I do not have a partitioned index, the GROUP BY operation does get pushed to indexer. Is it because the index is partitioned, the data comes from different partitions and the query node has to do the group by because Indexer does not have all the portions of the index data spread across the partitions.
You have PARTITION BY 4 keys and GROUP BY 2 keys. i.e Complete Aggregations can’t be done at PARTITION level.
But One can do 2 level aggregation. one level in indexer, second level on query, But 2 level aggregation only works when there is no DISTINCT aggregates (at partition level it doesn’t know anything about other partition, so it needs give all the information to query and query needs aggregation). Also Aggregates pushdown is all or nothing.
In your case you have DISTINCT aggregates that disqualifies. Also DISTINCT aggregates has strict rules. It can only pushdown when indexer able to apply without sort.
Example 1: For partition index the all the partition keys needs to present in the group keys to generate Full aggregations.
Example 3: In case of partial aggregation if any aggregate has DISTINCT modifier index grouping and aggregation is not possible.
Thanks for the detailed explanation. I also noticed one thing
If my partition key is like above, even if I give those three columns in my GROUP BY, it is not pushed down to indexer
GROUP BY col1,DATE_PART_STR(MILLIS_TO_STR(col2),"year"),DATE_PART_STR(MILLIS_TO_STR(col3),"month")
So looks like if I have any functions applied to a column in the partition key, GROUP BY does not get pushed down. If I just use col1 and col2, the push down happens
The group keys, hash keys, index keys must be all exactly same.
Please provide index definition and complete query with EXPLAIN. You need to look whole query and index.
Any pushdown need to happen query must able to pushdown all predicates and no false positives and indexer able to compute them.