Simple group by query using index is taking long time

I have a couchbase cluster with three nodes.
Couchbase version - Community Edition 7.2.0 build 5325
Bucket name - Trending_Test
Scope - trending
Collections- livetv, cutv & vod

Currently I have data only in collection livetv.
Sample data -
key - 10000199_1_280_3189x20240305x2155
value- {"3":"10000199","5":"2936156","6":"provider","11":1709675701000,"13":1,"14":"280","22":"3189x20240305x2155_1709675700000"}

I have primary index and a secondary index as follows-

CREATE INDEX trending_livetv_idxONTrending_Test.trending.livetv(6,14,22)

Query -

select `14` as channel, `22` as program, count(1) views
 from `Trending_Test`.`trending`.`livetv`  
where `6`="provider" 
group by `14`, `22` 
order by views desc;

With 800K documents in livetv collection, the query takes approximately 1 min. Sometimes when there are simultaneous insertion/deletion of documents, query takes more than 2 mins as well. I have that node where this secondary index resides CPU utilization goes upto 99.9%.

I can see the query plan is using secondary index.
Queries-

  1. Is the query time expected or can we do something about it ?
  2. What can we do to reduce the CPU utilization ?
{
    "plan": {
        "#operator": "Sequence",
        "~children": [
            {
                "#operator": "Sequence",
                "~children": [
                    {
                        "#operator": "IndexScan3",
                        "bucket": "Trending_Test",
                        "covers": [
                            "cover ((`livetv`.`6`))",
                            "cover ((`livetv`.`14`))",
                            "cover ((`livetv`.`22`))",
                            "cover ((meta(`livetv`).`id`))"
                        ],
                        "index": "trending_livetv_idx",
                        "index_id": "489ee29cd2f397e8",
                        "index_projection": {
                            "entry_keys": [
                                0,
                                1,
                                2
                            ]
                        },
                        "keyspace": "livetv",
                        "namespace": "default",
                        "scope": "trending",
                        "spans": [
                            {
                                "exact": true,
                                "range": [
                                    {
                                        "high": "\"provider\"",
                                        "inclusion": 3,
                                        "index_key": "`6`",
                                        "low": "\"provider\""
                                    }
                                ]
                            }
                        ],
                        "using": "gsi"
                    },
                    {
                        "#operator": "Parallel",
                        "~child": {
                            "#operator": "Sequence",
                            "~children": [
                                {
                                    "#operator": "Filter",
                                    "condition": "(cover ((`livetv`.`6`)) = \"provider\")"
                                },
                                {
                                    "#operator": "InitialGroup",
                                    "aggregates": [
                                        "count(1)"
                                    ],
                                    "group_keys": [
                                        "cover ((`livetv`.`14`))",
                                        "cover ((`livetv`.`22`))"
                                    ]
                                }
                            ]
                        }
                    },
                    {
                        "#operator": "IntermediateGroup",
                        "aggregates": [
                            "count(1)"
                        ],
                        "group_keys": [
                            "cover ((`livetv`.`14`))",
                            "cover ((`livetv`.`22`))"
                        ]
                    },
                    {
                        "#operator": "FinalGroup",
                        "aggregates": [
                            "count(1)"
                        ],
                        "group_keys": [
                            "cover ((`livetv`.`14`))",
                            "cover ((`livetv`.`22`))"
                        ]
                    },
                    {
                        "#operator": "Parallel",
                        "~child": {
                            "#operator": "Sequence",
                            "~children": [
                                {
                                    "#operator": "InitialProject",
                                    "result_terms": [
                                        {
                                            "as": "channel",
                                            "expr": "cover ((`livetv`.`14`))"
                                        },
                                        {
                                            "as": "program",
                                            "expr": "cover ((`livetv`.`22`))"
                                        },
                                        {
                                            "as": "views",
                                            "expr": "count(1)"
                                        }
                                    ]
                                }
                            ]
                        }
                    }
                ]
            },
            {
                "#operator": "Order",
                "sort_terms": [
                    {
                        "desc": "\"desc\"",
                        "expr": "`views`"
                    }
                ]
            }
        ]
    }
}

When I plug your query into the WebUI and click on “Index Advisor”, it shows to also create this index.

CREATE INDEX adv_6 ON default:Trending_Test.trending.livetv(6)

Can you try that and rerun your query after it is done indexing?

Since the predicate ( 6=“provider” ) is on only 6, but the index is on (6, 14, 11) the whole index must be scanned to find entries where 6=“provider”. With an index on only 6, an index-lookup on provider would return those indexes.

In the Community Edition, the pushdown of aggregation and grouping to the Index is not supported. ( Grouping and Aggregate Pushdown | Couchbase Docs )

This means that irrespective of the index used to service the query, the actual grouping of the documents will be performed in the Query engine.

And in your query the Order operation must be performed as well.

This can increase CPU utilisation and query execution time if there are a large number of documents to be grouped and then sorted.

CE limits query service 4 cores

That shouldn’t make much of a difference unless you are calling with a QueryScanConsistency or ScanConsistency to wait for the insertions/updates/deletions to be indexed. You might consider just querying on what’s currently indexed.

Edit: I suppose if the indexer is resource-starved due to indexing those changes, the query would take longer regardless of any consistency option.

1 Like