Count(*) is super slow, and index across multiple document types is not fast enough

@vsr1

oh my god, I am going to throw my computer out of the window

I cannot make this work no matter what I do.

I have multiple documents (200 million) of the format: (filtered by app_primary_genre_id = 1 - will result in 200k documents )

{
“country_code”: “us”,
“sub_type”: “app”,
“redacted”: {
“app_name”: “dark war”,
“app_primary_genre_id”: 1,
"app_provider_uuid":  "eecefa65-12bd-45d5-93b8-eef72c83ed24",
"release_date": 1234,
"user_rating": 4.23
}
“key”: “meta:bc9f7bc7-ad0a-4f91-a213-357872d34d8c:is”,
“created_timestamp”: 1602820362,
“type”: “tracker-app”,
“version”: 2,
"app_uuid": "bc9f7bc7-ad0a-4f91-a213-357872d34d8c",
“updated_timestamp”: 1611807733
}

and another (100k) of format:

{
“sub_type”: “app-extra”,
“key”: “meta:bc9f7bc7-ad0a-4f91-a213-357872d34d8c:e:1”,
“created_timestamp”: 1602820362,
“type”: “tracker-app”,
"i_genre_id": "1001",
“version”: 2,
“updated_timestamp”: 1611807733
}

I am trying to make a query to join and filter these two type of documents:

SELECT META(a).id,
   FROM `bucket` a LEFT
        JOIN `bucket` e ON a.app_uuid = e.app_uuid
        AND e.sub_type="app-extra"
    WHERE a.sub_type="app"
        AND a.country_code = "us"
        AND a.redacted.app_primary_genre_id = 1
        AND a.redacted.app_provider_uuid= "eecefa65-12bd-45d5-93b8-eef72c83ed24" /* (this is optional) */
        AND a.redacted.release_date <=  1234  /* (this is optional) */
        AND a.redacted.user_rating <=  3  /* (this is optional) */
        AND a.redacted.app_name LIKE "%dark%"  /* (this is optional) */
        AND e.i_genre_id LIKE "2%"  /* (this is optional) */

I have created all the recommended queries (including the covering ones), the results (8 in total) of this query takes 2 seconds. Which is unacceptably high, but If I do a count(*)

 SELECT count(*)
    FROM `bucket` a LEFT
        JOIN `bucket` e ON a.app_uuid = e.app_uuid
        AND e.sub_type="app-extra"
    WHERE a.sub_type="app"
        AND a.country_code = "us"
        AND a.redacted.app_primary_genre_id = 1
        AND a.redacted.app_provider_uuid= "eecefa65-12bd-45d5-93b8-eef72c83ed24" /* (this is optional) */
        AND a.redacted.release_date <=  1234  /* (this is optional) */
        AND a.redacted.user_rating <=  3  /* (this is optional) */
        AND a.redacted.app_name LIKE "%dark%"  /* (this is optional) */
        AND e.i_genre_id LIKE "2%"  /* (this is optional) */

I get the results in tens of minutes, so, clearly, I am doing something wrong.

I have created so many indexes (lost count), based on your https://index-advisor.couchbase.com/indexadvisor/ but also on the index recommendations in the query result page (“Advice” tab). I don’t know what else to do, this is not working!!

I am trying to use this type of queries on a paginated search results page for a web application. So, I will also need to implement some kind of pagination

I don’t understand what is wrong and how can this be improved, but I feel I am going crazy.

Is it possible to get anybody from this forum to pay to help me with all these queries? (probably will be cheaper than madness medication)

CREATE INDEX ix1 ON `bucket`(country_code, redacted.app_primary_genre_id , redacted.app_provider_uuid,
      redacted.release_date,  redacted.user_rating, redacted.app_name , app_uuid)
WHERE sub_type = "app";
CREATE INDEX ix2 ON `bucket` ( app_uuid, i_genre_id ) WHERE sub_type = "app-extra";

Thank you @vsr1

I have created the indexes per your suggestion (ix1111, and ix222):

but when I do a count (*) without filters ( the number of results is around 200k) it stays 10 minutes to do the count

SELECT count(*)
FROM `bucket` a USE INDEX (ix1111 )LEFT
    JOIN `bucket` e USE INDEX (ix2222 ) ON a.app_uuid = e.app_uuid
    AND e.sub_type="app-extra"
WHERE a.sub_type="app"
    AND a.country_code = "us"
    AND a.redacted.app_primary_genre_id = 1

also as I continue to navigateincreasign the OFFSE and keeping same LIMIT the response time is starting to go up with 300 ms for every additional 100 result on the OFFSET

if I try to order based on meta(a).id so that I could do the pagination using where meta(a).id > xxxxxyyyy the ordering takes minutes, so, again this is not a good solution.

What is the recommended solution for this?

I was thinking to put a query timeout on the count(*) query and if the query takes longer than x milliseconds to run just to consider that the number of the result is bigger than let’s say 10k, and as I navigate (through multiple pages) I can just update this total number until the result are smaller than the limit, and that’s how I could know when I am at the end of the list. But in order for this to work, I need to make the pagination to be fast. Do you have any recommendations on how to make the pagination fast?

There is no cursors it needs to restart from beginning. No easy solution Using OFFSET and Keyset in N1QL | The Couchbase Blog

This is a very old problem, and all the websites have soeme kind o pagination. You guys really need to nail this down at some point, as this is a BIG - BIG isue for people trying to use Couchbase on web projects…