Strange differences in query response times

Just another little test. If I add another locationtype (“4”) like here:

SELECT year,
       SUM(ARRAY_LENGTH(catchkeys)) count
FROM data USE KEYS (
    SELECT RAW META().id
    FROM data
    WHERE type = "FishingTrip"
        AND ((locationlevel2 = "91"
                AND locationtype = "1")
            OR (locationlevel4 = "91"
                AND locationtype IN ["2","4"])))
WHERE catchkeys IS VALUED
    AND ARRAY_LENGTH(catchkeys) > 0
GROUP BY year
ORDER BY year DESC

then it doubles the response time to 530 ms.

So what is the conclusion? Do we have an explanation of the differences in the queries?

Should I rewrite the queries to the above (nested RAW select) - or can I do something with index creation? Or perhaps I need to divide the queries into separate queries in code and combine the result…

Your environment isn’t starved of resources when both memcached & GSI are operating at the same time to service your query?

I’ve tried to emulate this with fabricated data on a small VM (3 GiB RAM, 1 x 1.8 GHz CPU, NVMe (host) disk though).

I have created 1,000,000 documents; 100 meet the index filtering (or the ID list) and 6 match your full query.

cbq> select avg(encoded_size(d)) from data d;
...
    "results": [
    {
        "$1": 1095.972096
    }
    ],
...

The bucket is permitted 512 MiB - it is about 13% resident when I start your query. (The index is 0% resident.)

I am getting the same execution plan as you using “def_type_trip_location_2”.

I don’t see an issue:

cbq> SELECT year, SUM(ARRAY_LENGTH(catchkeys)) count FROM data WHERE type="FishingTrip" AND ((locationlevel2='91' AND locationtype='1') OR (locationlevel4='91' AND locationtype='2')) AND catchkeys IS VALUED AND ARRAY_LENGTH(catchkeys) > 0 GROUP BY year ORDER BY year DESC ;
{
    "requestID": "0e9abf2b-01a1-460c-a141-24f7d6afde99",
    "signature": {
        "count": "number",
        "year": "json"
    },
    "results": [
    {
        "count": 6,
        "year": "2024-02-27"
    }
    ],
    "status": "success",
    "metrics": {
        "elapsedTime": "60.391688ms",
        "executionTime": "60.313536ms",
        "resultCount": 1,
        "resultSize": 56,
        "serviceLoad": 25,
        "sortCount": 1
    }
}

But I am using:

cbq> select ds_version();
...
    "results": [
    {
        "$1": "7.2.4-7070-community"
    }
    ],
...

Hmmm… I wouldn’t think so. The VMs have 16GB RAM and I have 11.532 for Data and 3072 for Index (and 256 for Search).

I tried to use the “External Query Advisor” with the original troubling query and it suggested an index:

CREATE INDEX adv_type_locationtype_array_length_catchkeys_catchkeys_locationlevel4_locationlevel2_year ON `data`(`type`,`locationtype`,ARRAY_LENGTH((`catchkeys`)),`catchkeys`,`locationlevel4`,`locationlevel2`,`year`)

… and I could probably remove the catchkeys column as I am only interested in the length and I guess it will just return zero if it hasn’t got any values.

But the interesting thing is that the query that used to take 2s now takes 260ms - so the same as the one with the nested RAM query…

I suppose I should have tried the advisor before… :smile:

I think I should try to add the length of the catchkeys (and year) to the existing index and see if that also works - instead of always creating new indexes

At least there is a solution! (Still weird though.)

Agree… I’l try out a couple of things to avoid creating too many indexes. One thing I noticed in the “Advisor index” is that there wasn’t a “WHERE” clause to limit the docs to the type with this data. I have always added that when I could to narrow down the selection - but is this not a good idea?

Thanks for helping!

Out of interest, what happens if you drop and recreate the original/existing index? (i.e. is it an issue in the index - would have to be resources needed to process, since we know it is fast)

Just a quick feedback. Adding the length of the catchkeys and the year to the existing index made it run just as fast.

I actually had to recreate the index on one of the servers only a couple of weeks ago as they were stuck in “warming up” after a restart.

And I should also consider upgrading to a later version of CB. Apparently the repo we set up to let us know if there were new versions doesn’t seem to work any more. But it seems to be a good idea to get the latest updates.

… I thought I had seen the name of the index in the EXPLAIN - but cannot see it now???

{
    "plan": {
        "#operator": "Sequence",
        "~children": [
            {
                "#operator": "Sequence",
                "~children": [
                    {
                        "#operator": "KeyScan",
                        "keys": "(select raw (meta(`data`).`id`) from `default`:`data` where (((`data`.`type`) = \"FishingTrip\") and ((((`data`.`locationlevel2`) = \"91\") and ((`data`.`locationtype`) = \"1\")) or (((`data`.`locationlevel4`) = \"91\") and ((`data`.`locationtype`) in [\"2\"])))))"
                    },
                    {
                        "#operator": "Fetch",
                        "keyspace": "data",
                        "namespace": "default"
                    },
                    {
                        "#operator": "Parallel",
                        "maxParallelism": 1,
                        "~child": {
                            "#operator": "Sequence",
                            "~children": [
                                {
                                    "#operator": "Filter",
                                    "condition": "(0 < array_length((`data`.`catchkeys`)))"
                                },
                                {
                                    "#operator": "InitialGroup",
                                    "aggregates": [
                                        "sum(array_length((`data`.`catchkeys`)))"
                                    ],
                                    "group_keys": [
                                        "(`data`.`year`)"
                                    ]
                                }
                            ]
                        }
                    },
                    {
                        "#operator": "IntermediateGroup",
                        "aggregates": [
                            "sum(array_length((`data`.`catchkeys`)))"
                        ],
                        "group_keys": [
                            "(`data`.`year`)"
                        ]
                    },
                    {
                        "#operator": "FinalGroup",
                        "aggregates": [
                            "sum(array_length((`data`.`catchkeys`)))"
                        ],
                        "group_keys": [
                            "(`data`.`year`)"
                        ]
                    },
                    {
                        "#operator": "Parallel",
                        "maxParallelism": 1,
                        "~child": {
                            "#operator": "Sequence",
                            "~children": [
                                {
                                    "#operator": "InitialProject",
                                    "result_terms": [
                                        {
                                            "expr": "(`data`.`year`)"
                                        },
                                        {
                                            "as": "count",
                                            "expr": "sum(array_length((`data`.`catchkeys`)))"
                                        }
                                    ]
                                }
                            ]
                        }
                    }
                ]
            },
            {
                "#operator": "Order",
                "sort_terms": [
                    {
                        "desc": "\"desc\"",
                        "expr": "(`data`.`year`)"
                    }
                ]
            }
        ]
    }
}

Ok, this was a little interesting. I deleted the …_2 index (there is a …_1 on the other server). And response was around 300ms - so virtually the same…

I recreated both indexes (one on each server) and now response times are 270-300ms :slight_smile: So I guess something must have been bad with the existing index(es).

No, sorry - I must be getting tired here. I incorrectly called the query with the nested RAW query. Running the original query went back to 2s :frowning:

Recreating the index with these fields:

CREATE INDEX `def_type_trip_location_2` ON `data`(`type`,`locationtype`,ARRAY_LENGTH(catchkeys),`statslocation`,`locationlevel1`,`locationlevel2`,`locationlevel4`,`fisherytype`,`year`) WHERE (`type` = "FishingTrip")

Put the original query (without catchkeys IS VALUED) back to 260ms:

SELECT year,
       SUM(ARRAY_LENGTH(catchkeys)) count
FROM data
WHERE type="FishingTrip"
    AND ((locationlevel2='91' AND locationtype='1')
        OR (locationlevel4='91' AND locationtype='2'))
    AND ARRAY_LENGTH(catchkeys) > 0
GROUP BY year
ORDER BY year DESC
1 Like

Not to worry; minor mistakes late in the day are inevitable. :slight_smile:

Your new index is a partial - so a chunk of the filtering is potentially taken out of the equation. Do you happen to know how many documents have a type = “FishingTrip” out of your 1 million? Still weird since we know the same scan to gather keys is quick.

Yes, the IS VALUED clause was redundant; ARRAY_LENGTH() returns MISSING if catchkeys is MISSING, and MISSING > 0 is (obviously) false.

I’ll try with 7.1.1 and see if I can reproduce some sort of performance issue - but it seems unlikely.

Yes, no change for me - still quick with the original index & query on “7.1.1-3175-community” (75.171209ms).

There are 150.000 documents with type = “FishingTrip”

PS. Was out training for a couple of hours - therefore the late reply :slight_smile:

All 1 million of mine are type = “FishingTrip”… :slight_smile: I have tried to include a variety of values for locationtype, locationlevel2 & locationlevel4 in the dataset - but then we know the index access is fast.

Ok, so today I discovered another very similar query that is extremely slow…

SELECT * FROM data 
WHERE type='FishingTrip' AND (
    ((locationtype='1' AND locationlevel2='91') or 
    (locationtype='2' AND locationlevel4='91'))
    )

It takes 1.7-2 secs to return the 105 docs.

It does seem to use the modified index - EXPLAIN:

{
    "plan": {
        "#operator": "Sequence",
        "~children": [
            {
                "#operator": "IndexScan3",
                "index": "def_type_trip_location_2",
                "index_id": "745f951ec2ae6db1",
                "index_projection": {
                    "primary_key": true
                },
                "keyspace": "data",
                "namespace": "default",
                "spans": [
                    {
                        "range": [
                            {
                                "high": "\"FishingTrip\"",
                                "inclusion": 3,
                                "index_key": "`type`",
                                "low": "\"FishingTrip\""
                            },
                            {
                                "high": "\"1\"",
                                "inclusion": 3,
                                "index_key": "`locationtype`",
                                "low": "\"1\""
                            },
                            {
                                "inclusion": 0,
                                "index_key": "array_length(`catchkeys`)"
                            },
                            {
                                "inclusion": 0,
                                "index_key": "`statslocation`"
                            },
                            {
                                "inclusion": 0,
                                "index_key": "`locationlevel1`"
                            },
                            {
                                "inclusion": 0,
                                "index_key": "`locationlevel2`"
                            },
                            {
                                "inclusion": 0,
                                "index_key": "`locationlevel4`"
                            },
                            {
                                "inclusion": 0,
                                "index_key": "`fisherytype`"
                            },
                            {
                                "inclusion": 0,
                                "index_key": "`year`",
                                "low": "null"
                            }
                        ]
                    },
                    {
                        "range": [
                            {
                                "high": "\"FishingTrip\"",
                                "inclusion": 3,
                                "index_key": "`type`",
                                "low": "\"FishingTrip\""
                            },
                            {
                                "high": "\"2\"",
                                "inclusion": 3,
                                "index_key": "`locationtype`",
                                "low": "\"2\""
                            },
                            {
                                "inclusion": 0,
                                "index_key": "array_length(`catchkeys`)"
                            },
                            {
                                "inclusion": 0,
                                "index_key": "`statslocation`"
                            },
                            {
                                "inclusion": 0,
                                "index_key": "`locationlevel1`"
                            },
                            {
                                "inclusion": 0,
                                "index_key": "`locationlevel2`"
                            },
                            {
                                "inclusion": 0,
                                "index_key": "`locationlevel4`"
                            },
                            {
                                "inclusion": 0,
                                "index_key": "`fisherytype`"
                            },
                            {
                                "inclusion": 0,
                                "index_key": "`year`",
                                "low": "null"
                            }
                        ]
                    }
                ],
                "using": "gsi"
            },
            {
                "#operator": "Fetch",
                "keyspace": "data",
                "namespace": "default"
            },
            {
                "#operator": "Parallel",
                "~child": {
                    "#operator": "Sequence",
                    "~children": [
                        {
                            "#operator": "Filter",
                            "condition": "((((`data`.`type`) = \"FishingTrip\") and ((((`data`.`locationtype`) = \"1\") and ((`data`.`locationlevel2`) = \"91\")) or (((`data`.`locationtype`) = \"2\") and ((`data`.`locationlevel4`) = \"91\")))) and ((`data`.`year`) is valued))"
                        },
                        {
                            "#operator": "InitialProject",
                            "result_terms": [
                                {
                                    "expr": "self",
                                    "star": true
                                }
                            ]
                        }
                    ]
                }
            }
        ]
    }
}

So I tried to add the extra criteria (but not the length of catchkeys as I need all) that we used in the attempts with the other query:

SELECT * FROM data 
WHERE type='FishingTrip' AND (
    ((locationtype='1' AND locationlevel2='91') or 
    (locationtype='2' AND locationlevel4='91'))
    )
    and year is valued

And the first 2 times it ran in 2 secs and then suddenly in 250ms (??).

How could that possibly be?? When I do an EXPLAIN it shows one or the other (but I’m not sure that the EXPLAIN is related to the last execution though). I have tried to remove and recreate indexes - but it does not seem to make any difference… I’m puzzled…

Perhaps I should just get the repo sorted out and update to latest CE server :persevere:

I tried to use the advisor again to suggest an index (without the year attribute). It came up with:

CREATE INDEX adv_locationtype_type_locationlevel2_locationlevel4 ON `data`(`locationtype`,`type`,`locationlevel2`,`locationlevel4`)

The query:

SELECT * FROM data 
WHERE type='FishingTrip' AND (
    ((locationtype='1' AND locationlevel2='91') or 
    (locationtype='2' AND locationlevel4='91'))
    )

still takes way too long (same time).

It does seem to use the new index - EXPLAIN:

{
    "plan": {
        "#operator": "Sequence",
        "~children": [
            {
                "#operator": "IndexScan3",
                "index": "adv_locationtype_type_locationlevel2_locationlevel4",
                "index_id": "454f8ba0b6e95cce",
                "index_projection": {
                    "primary_key": true
                },
                "keyspace": "data",
                "namespace": "default",
                "spans": [
                    {
                        "range": [
                            {
                                "high": "\"1\"",
                                "inclusion": 3,
                                "index_key": "`locationtype`",
                                "low": "\"1\""
                            },
                            {
                                "high": "\"FishingTrip\"",
                                "inclusion": 3,
                                "index_key": "`type`",
                                "low": "\"FishingTrip\""
                            },
                            {
                                "inclusion": 0,
                                "index_key": "`locationlevel2`"
                            },
                            {
                                "inclusion": 0,
                                "index_key": "`locationlevel4`"
                            }
                        ]
                    },
                    {
                        "range": [
                            {
                                "high": "\"2\"",
                                "inclusion": 3,
                                "index_key": "`locationtype`",
                                "low": "\"2\""
                            },
                            {
                                "high": "\"FishingTrip\"",
                                "inclusion": 3,
                                "index_key": "`type`",
                                "low": "\"FishingTrip\""
                            },
                            {
                                "inclusion": 0,
                                "index_key": "`locationlevel2`"
                            },
                            {
                                "inclusion": 0,
                                "index_key": "`locationlevel4`"
                            }
                        ]
                    }
                ],
                "using": "gsi"
            },
            {
                "#operator": "Fetch",
                "keyspace": "data",
                "namespace": "default"
            },
            {
                "#operator": "Parallel",
                "~child": {
                    "#operator": "Sequence",
                    "~children": [
                        {
                            "#operator": "Filter",
                            "condition": "(((`data`.`type`) = \"FishingTrip\") and ((((`data`.`locationtype`) = \"1\") and ((`data`.`locationlevel2`) = \"91\")) or (((`data`.`locationtype`) = \"2\") and ((`data`.`locationlevel4`) = \"91\"))))"
                        },
                        {
                            "#operator": "InitialProject",
                            "result_terms": [
                                {
                                    "expr": "self",
                                    "star": true
                                }
                            ]
                        }
                    ]
                }
            }
        ]
    }
}

If, however, I run two queries (one for each location type - avoiding the OR) then they are blistering fast (25+200ms).

This will give equivalent results. There won’t be duplicates when there are different values for locationtype.

SELECT * FROM my_bucket 
WHERE type='FishingTrip' AND 
    (locationtype='1' AND locationlevel2='91')
    UNION
SELECT * FROM my_bucket 
WHERE type='FishingTrip' AND  
    (locationtype='2' AND locationlevel4='91')

When duplicates are possible, they can be eliminated with:

select distinct *
from
(SELECT * FROM my_bucket 
WHERE type='FishingTrip' AND 
    (locationtype='1' AND locationlevel2='91')
    UNION
SELECT * FROM my_bucket 
WHERE type='FishingTrip' AND  
    (locationtype='2' AND locationlevel4='91')
    ) t
1 Like

@jda, as @mreiche noted, a UNION could be used to workaround this for you - assuming that it performs as expected?

I still don’t have an explanation for why your query with the OR is so much slower; without a reproduction I’m unable to take it further. (I’d like to be able to explain it fully and raise any issues found, if there are any.)

How does

SELECT count(1) FROM data 
WHERE type='FishingTrip' AND (
    ((locationtype='1' AND locationlevel2='91') or 
    (locationtype='2' AND locationlevel4='91'))
    )

perform? (It should use the same index scan with no fetch or filter following in the plan.)

Well that performs well: 270-300ms. As soon as I change it to return * instead of COUNT(1) it goes up to 1.8-2s

The EXPLAIN for the count:

{
    "plan": {
        "#operator": "Sequence",
        "~children": [
            {
                "#operator": "IndexScan3",
                "covers": [
                    "cover ((`data`.`type`))",
                    "cover ((`data`.`locationtype`))",
                    "cover ((`data`.`statslocation`))",
                    "cover ((`data`.`locationlevel1`))",
                    "cover ((`data`.`locationlevel2`))",
                    "cover ((`data`.`locationlevel4`))",
                    "cover ((`data`.`fisherytype`))",
                    "cover ((meta(`data`).`id`))"
                ],
                "filter_covers": {
                    "cover ((`data`.`type`))": "FishingTrip"
                },
                "index": "def_type_trip_location_2",
                "index_id": "ae360167e11b075",
                "index_projection": {
                    "entry_keys": [
                        0,
                        1,
                        4,
                        5
                    ]
                },
                "keyspace": "data",
                "namespace": "default",
                "spans": [
                    {
                        "range": [
                            {
                                "high": "\"FishingTrip\"",
                                "inclusion": 3,
                                "index_key": "`type`",
                                "low": "\"FishingTrip\""
                            },
                            {
                                "high": "\"1\"",
                                "inclusion": 3,
                                "index_key": "`locationtype`",
                                "low": "\"1\""
                            },
                            {
                                "inclusion": 0,
                                "index_key": "`statslocation`"
                            },
                            {
                                "inclusion": 0,
                                "index_key": "`locationlevel1`"
                            },
                            {
                                "inclusion": 0,
                                "index_key": "`locationlevel2`"
                            },
                            {
                                "inclusion": 0,
                                "index_key": "`locationlevel4`"
                            }
                        ]
                    },
                    {
                        "range": [
                            {
                                "high": "\"FishingTrip\"",
                                "inclusion": 3,
                                "index_key": "`type`",
                                "low": "\"FishingTrip\""
                            },
                            {
                                "high": "\"2\"",
                                "inclusion": 3,
                                "index_key": "`locationtype`",
                                "low": "\"2\""
                            },
                            {
                                "inclusion": 0,
                                "index_key": "`statslocation`"
                            },
                            {
                                "inclusion": 0,
                                "index_key": "`locationlevel1`"
                            },
                            {
                                "inclusion": 0,
                                "index_key": "`locationlevel2`"
                            },
                            {
                                "inclusion": 0,
                                "index_key": "`locationlevel4`"
                            }
                        ]
                    }
                ],
                "using": "gsi"
            },
            {
                "#operator": "Parallel",
                "~child": {
                    "#operator": "Sequence",
                    "~children": [
                        {
                            "#operator": "Filter",
                            "condition": "((cover ((`data`.`type`)) = \"FishingTrip\") and (((cover ((`data`.`locationtype`)) = \"1\") and (cover ((`data`.`locationlevel2`)) = \"91\")) or ((cover ((`data`.`locationtype`)) = \"2\") and (cover ((`data`.`locationlevel4`)) = \"91\"))))"
                        },
                        {
                            "#operator": "InitialGroup",
                            "aggregates": [
                                "count(1)"
                            ],
                            "group_keys": []
                        }
                    ]
                }
            },
            {
                "#operator": "IntermediateGroup",
                "aggregates": [
                    "count(1)"
                ],
                "group_keys": []
            },
            {
                "#operator": "FinalGroup",
                "aggregates": [
                    "count(1)"
                ],
                "group_keys": []
            },
            {
                "#operator": "Parallel",
                "~child": {
                    "#operator": "Sequence",
                    "~children": [
                        {
                            "#operator": "InitialProject",
                            "result_terms": [
                                {
                                    "expr": "count(1)"
                                }
                            ]
                        }
                    ]
                }
            }
        ]
    }
}

and for the * query:

{
    "plan": {
        "#operator": "Sequence",
        "~children": [
            {
                "#operator": "IndexScan3",
                "index": "def_type_trip_location_2",
                "index_id": "ae360167e11b075",
                "index_projection": {
                    "primary_key": true
                },
                "keyspace": "data",
                "namespace": "default",
                "spans": [
                    {
                        "range": [
                            {
                                "high": "\"FishingTrip\"",
                                "inclusion": 3,
                                "index_key": "`type`",
                                "low": "\"FishingTrip\""
                            },
                            {
                                "high": "\"1\"",
                                "inclusion": 3,
                                "index_key": "`locationtype`",
                                "low": "\"1\""
                            },
                            {
                                "inclusion": 0,
                                "index_key": "`statslocation`"
                            },
                            {
                                "inclusion": 0,
                                "index_key": "`locationlevel1`"
                            },
                            {
                                "inclusion": 0,
                                "index_key": "`locationlevel2`"
                            },
                            {
                                "inclusion": 0,
                                "index_key": "`locationlevel4`"
                            }
                        ]
                    },
                    {
                        "range": [
                            {
                                "high": "\"FishingTrip\"",
                                "inclusion": 3,
                                "index_key": "`type`",
                                "low": "\"FishingTrip\""
                            },
                            {
                                "high": "\"2\"",
                                "inclusion": 3,
                                "index_key": "`locationtype`",
                                "low": "\"2\""
                            },
                            {
                                "inclusion": 0,
                                "index_key": "`statslocation`"
                            },
                            {
                                "inclusion": 0,
                                "index_key": "`locationlevel1`"
                            },
                            {
                                "inclusion": 0,
                                "index_key": "`locationlevel2`"
                            },
                            {
                                "inclusion": 0,
                                "index_key": "`locationlevel4`"
                            }
                        ]
                    }
                ],
                "using": "gsi"
            },
            {
                "#operator": "Fetch",
                "keyspace": "data",
                "namespace": "default"
            },
            {
                "#operator": "Parallel",
                "~child": {
                    "#operator": "Sequence",
                    "~children": [
                        {
                            "#operator": "Filter",
                            "condition": "(((`data`.`type`) = \"FishingTrip\") and ((((`data`.`locationtype`) = \"1\") and ((`data`.`locationlevel2`) = \"91\")) or (((`data`.`locationtype`) = \"2\") and ((`data`.`locationlevel4`) = \"91\"))))"
                        },
                        {
                            "#operator": "InitialProject",
                            "result_terms": [
                                {
                                    "expr": "self",
                                    "star": true
                                }
                            ]
                        }
                    ]
                }
            }
        ]
    }
}

Does that give any clue? I can see there are some differences.