Strange differences in query response times

I posted this on Discord last week - but as I have had no reactions I will try here

I have a query that I need to extend a bit:

SELECT year,
       SUM(ARRAY_LENGTH(catchkeys)) count
FROM data
WHERE type="FishingTrip"
    AND locationlevel4='91' AND locationtype='2'
    AND catchkeys IS VALUED
    AND ARRAY_LENGTH(catchkeys) > 0
GROUP BY year
ORDER BY year DESC

Response time: 200ms (returns 6 docs)

I need to add another condition like this:

SELECT year,
       SUM(ARRAY_LENGTH(catchkeys)) count
FROM data
WHERE type="FishingTrip"
    AND locationlevel2='91' AND locationtype='1'
    AND catchkeys IS VALUED
    AND ARRAY_LENGTH(catchkeys) > 0
GROUP BY year
ORDER BY year DESC

Response time: 25ms (returns 1 doc)

Both use the same index.

Then I combine the two queries into one:

SELECT year,
       SUM(ARRAY_LENGTH(catchkeys)) count
FROM data
WHERE type="FishingTrip"
    AND ((locationlevel2='91' AND locationtype='1')
        OR (locationlevel4='91' AND locationtype='2'))
    AND catchkeys IS VALUED
    AND ARRAY_LENGTH(catchkeys) > 0
GROUP BY year
ORDER BY year DESC

Response time: 2secs (returns 7 docs)
This query uses the same index as the first two.

This is the definition of the index:

CREATE INDEX `def_type_trip_location_2` ON `data`(`type`,`locationtype`,`statslocation`,`locationlevel1`,`locationlevel2`,`locationlevel4`,`fisherytype`)

How can that take so much longer time? And what could I do to improve the last query?

Are you using Enterprise Edition or Community Edition? (And which version?)

You state you’re returning 7 documents - out of how many in the bucket?

If I EXPLAIN your statement (on 7.2.4) I see a pretty straightforward plan - an index scan with two spans (the OR’ed conditions) feeding a fetch followed by a filter then grouping and ordering. I would expect it to be down to either the index scan or the fetch - either the scan itself takes a long time or it produces a large number of documents for the fetch (which are subsequently filtered). If you’re using EE, then the simplest would be to set profile=timings and execute the statement - the resulting profile will highlight where the time is spent.

SELECT COUNT(1)
FROM data
WHERE `data`.`type` = "FishingTrip"
AND
(
    (
            `data`.`locationlevel2` = "91" 
        AND `data`.`locationtype` = "1"
    )
    OR
    (
            `data`.`locationlevel4` = "91"
        AND `data`.`locationtype` = "2"
    )
)

should tell you how many keys the index scan is producing (and will likely give you an idea if this is slow or not).

Sorry I forgot the version info: Community Edition 7.1.1 build 3175

The bucket contains 1 million docs.

The query you wrote returns 100 docs in 250ms which is more like what I would expect.

OK, so that leaves the fetch - you could break this down to confirm the performance using this to generate a statement to run:

SELECT *
FROM data USE KEYS[ $keys ]

And set $keys to all 100 keys from:

SELECT RAW meta().id
FROM  data
WHERE `data`.`type` = "FishingTrip"
AND
(
    (
            `data`.`locationlevel2` = "91" 
        AND `data`.`locationtype` = "1"
    )
    OR
    (
            `data`.`locationlevel4` = "91"
        AND `data`.`locationtype` = "2"
    )
)
;

We of course are anticipating approximately 20ms per document to fetch. What’s the average size of these documents?

SELECT AVG(ENCODED_SIZE(data)) avg
FROM  data
WHERE `data`.`type` = "FishingTrip"
AND
(
    (
            `data`.`locationlevel2` = "91" 
        AND `data`.`locationtype` = "1"
    )
    OR
    (
            `data`.`locationlevel4` = "91"
        AND `data`.`locationtype` = "2"
    )
)
;

The average is: 955.2190476190476

The select with “USE KEYS” runs in 10-13 ms (!!)

Can you EXPLAIN the problem statement and post that here? - I want to make sure I’m seeing the same plan you are.

Sure!

… see correct response below (EXPLAIN removed from this post)

I see that this info was also lost when copying from Discord. Obviously, that is needed here!

I see that I sent the EXPLAIN for a slightly different query as I did not include the locationtype="1" (one of the other attempts - but it made no difference).

Here is the correct one for the query with:

((locationlevel2='91' AND locationtype='1')
        OR (locationlevel4='91' AND locationtype='2'))

EXPLAIN:

{
    "plan": {
        "#operator": "Sequence",
        "~children": [
            {
                "#operator": "Sequence",
                "~children": [
                    {
                        "#operator": "IndexScan3",
                        "index": "def_type_trip_location_2",
                        "index_id": "80045c43b64064b5",
                        "index_projection": {
                            "primary_key": true
                        },
                        "keyspace": "data",
                        "namespace": "default",
                        "spans": [
                            {
                                "range": [
                                    {
                                        "high": "\"FishingTrip\"",
                                        "inclusion": 3,
                                        "index_key": "`type`",
                                        "low": "\"FishingTrip\""
                                    },
                                    {
                                        "high": "\"1\"",
                                        "inclusion": 3,
                                        "index_key": "`locationtype`",
                                        "low": "\"1\""
                                    },
                                    {
                                        "inclusion": 0,
                                        "index_key": "`locationlevel1`"
                                    },
                                    {
                                        "inclusion": 0,
                                        "index_key": "`locationlevel2`"
                                    },
                                    {
                                        "inclusion": 0,
                                        "index_key": "`locationlevel4`"
                                    }
                                ]
                            },
                            {
                                "range": [
                                    {
                                        "high": "\"FishingTrip\"",
                                        "inclusion": 3,
                                        "index_key": "`type`",
                                        "low": "\"FishingTrip\""
                                    },
                                    {
                                        "high": "\"2\"",
                                        "inclusion": 3,
                                        "index_key": "`locationtype`",
                                        "low": "\"2\""
                                    },
                                    {
                                        "inclusion": 0,
                                        "index_key": "`locationlevel1`"
                                    },
                                    {
                                        "inclusion": 0,
                                        "index_key": "`locationlevel2`"
                                    },
                                    {
                                        "inclusion": 0,
                                        "index_key": "`locationlevel4`"
                                    }
                                ]
                            }
                        ],
                        "using": "gsi"
                    },
                    {
                        "#operator": "Fetch",
                        "keyspace": "data",
                        "namespace": "default"
                    },
                    {
                        "#operator": "Parallel",
                        "~child": {
                            "#operator": "Sequence",
                            "~children": [
                                {
                                    "#operator": "Filter",
                                    "condition": "(((((`data`.`type`) = \"FishingTrip\") and ((((`data`.`locationlevel2`) = \"91\") and ((`data`.`locationtype`) = \"1\")) or (((`data`.`locationlevel4`) = \"91\") and ((`data`.`locationtype`) = \"2\")))) and ((`data`.`catchkeys`) is valued)) and (0 < array_length((`data`.`catchkeys`))))"
                                },
                                {
                                    "#operator": "InitialGroup",
                                    "aggregates": [
                                        "sum(array_length((`data`.`catchkeys`)))"
                                    ],
                                    "group_keys": [
                                        "(`data`.`year`)"
                                    ]
                                }
                            ]
                        }
                    },
                    {
                        "#operator": "IntermediateGroup",
                        "aggregates": [
                            "sum(array_length((`data`.`catchkeys`)))"
                        ],
                        "group_keys": [
                            "(`data`.`year`)"
                        ]
                    },
                    {
                        "#operator": "FinalGroup",
                        "aggregates": [
                            "sum(array_length((`data`.`catchkeys`)))"
                        ],
                        "group_keys": [
                            "(`data`.`year`)"
                        ]
                    },
                    {
                        "#operator": "Parallel",
                        "~child": {
                            "#operator": "Sequence",
                            "~children": [
                                {
                                    "#operator": "InitialProject",
                                    "result_terms": [
                                        {
                                            "expr": "(`data`.`year`)"
                                        },
                                        {
                                            "as": "count",
                                            "expr": "sum(array_length((`data`.`catchkeys`)))"
                                        }
                                    ]
                                }
                            ]
                        }
                    }
                ]
            },
            {
                "#operator": "Order",
                "sort_terms": [
                    {
                        "desc": "\"desc\"",
                        "expr": "(`data`.`year`)"
                    }
                ]
            }
        ]
    }
}

And here is one for the query without the OR (with just the one locationlevel/locationtype):

locationlevel2='91' AND locationtype='1'

EXPLAIN:

{
    "plan": {
        "#operator": "Sequence",
        "~children": [
            {
                "#operator": "Sequence",
                "~children": [
                    {
                        "#operator": "IndexScan3",
                        "index": "def_type_trip_location_2",
                        "index_id": "80045c43b64064b5",
                        "index_projection": {
                            "primary_key": true
                        },
                        "keyspace": "data",
                        "namespace": "default",
                        "spans": [
                            {
                                "exact": true,
                                "range": [
                                    {
                                        "high": "\"FishingTrip\"",
                                        "inclusion": 3,
                                        "index_key": "`type`",
                                        "low": "\"FishingTrip\""
                                    },
                                    {
                                        "high": "\"1\"",
                                        "inclusion": 3,
                                        "index_key": "`locationtype`",
                                        "low": "\"1\""
                                    },
                                    {
                                        "inclusion": 0,
                                        "index_key": "`locationlevel1`"
                                    },
                                    {
                                        "high": "\"91\"",
                                        "inclusion": 3,
                                        "index_key": "`locationlevel2`",
                                        "low": "\"91\""
                                    }
                                ]
                            }
                        ],
                        "using": "gsi"
                    },
                    {
                        "#operator": "Fetch",
                        "keyspace": "data",
                        "namespace": "default"
                    },
                    {
                        "#operator": "Parallel",
                        "~child": {
                            "#operator": "Sequence",
                            "~children": [
                                {
                                    "#operator": "Filter",
                                    "condition": "(((((`data`.`type`) = \"FishingTrip\") and (((`data`.`locationlevel2`) = \"91\") and ((`data`.`locationtype`) = \"1\"))) and ((`data`.`catchkeys`) is valued)) and (0 < array_length((`data`.`catchkeys`))))"
                                },
                                {
                                    "#operator": "InitialGroup",
                                    "aggregates": [
                                        "sum(array_length((`data`.`catchkeys`)))"
                                    ],
                                    "group_keys": [
                                        "(`data`.`year`)"
                                    ]
                                }
                            ]
                        }
                    },
                    {
                        "#operator": "IntermediateGroup",
                        "aggregates": [
                            "sum(array_length((`data`.`catchkeys`)))"
                        ],
                        "group_keys": [
                            "(`data`.`year`)"
                        ]
                    },
                    {
                        "#operator": "FinalGroup",
                        "aggregates": [
                            "sum(array_length((`data`.`catchkeys`)))"
                        ],
                        "group_keys": [
                            "(`data`.`year`)"
                        ]
                    },
                    {
                        "#operator": "Parallel",
                        "~child": {
                            "#operator": "Sequence",
                            "~children": [
                                {
                                    "#operator": "InitialProject",
                                    "result_terms": [
                                        {
                                            "expr": "(`data`.`year`)"
                                        },
                                        {
                                            "as": "count",
                                            "expr": "sum(array_length((`data`.`catchkeys`)))"
                                        }
                                    ]
                                }
                            ]
                        }
                    }
                ]
            },
            {
                "#operator": "Order",
                "sort_terms": [
                    {
                        "desc": "\"desc\"",
                        "expr": "(`data`.`year`)"
                    }
                ]
            }
        ]
    }
}

That’s the same plan as I’m seeing - I can’t see where the additional time would be spent. Logically if selection of keys is 250ms and fetching all 100 documents is 13ms, we have nothing obvious to explain the 2 seconds – the additional filters are trivial so even evaluating them 100 times shouldn’t be expensive.

What’s the performance of:

SELECT *
FROM data USE KEYS
(
  SELECT RAW meta().id
  FROM  data
  WHERE `data`.`type` = "FishingTrip"
  AND
  (
    (
            `data`.`locationlevel2` = "91" 
        AND `data`.`locationtype` = "1"
    )
    OR
    (
            `data`.`locationlevel4` = "91"
        AND `data`.`locationtype` = "2"
    )
  )
)
WHERE ARRAY_LENGTH(catchKeys) > 0

like ?

260ms - which is more like what I would expect :wink:
… but it returns no documents.

Without the WHERE ARRAY_LENGTH... filter, is it returning the 100 documents expected?

One moment… I’m following your idea and now have:

SELECT year,
       SUM(ARRAY_LENGTH(catchkeys)) count
FROM data USE KEYS [ "FishingTrip:031f57e512824663949ddfd27c2a2dd1", ...,"FishingTrip:7984cd887d2d4922b5a17b66fb26b6d3" ]
WHERE catchkeys IS VALUED
    AND ARRAY_LENGTH(catchkeys) > 0
GROUP BY year
ORDER BY year DESC

And that returns the right result - in 8 ms!

Will try to use the inner query to get the keys now.

Please check the case on the WHERE clause - I think I typo’d there.

You are right. Removing the WHERE and it returns 105 docs - in 270ms

And this one delivers the result Ok:

SELECT year,
       SUM(ARRAY_LENGTH(catchkeys)) count
FROM data USE KEYS (
    SELECT RAW META().id
    FROM data
    WHERE `data`.`type` = "FishingTrip"
        AND ( ( `data`.`locationlevel2` = "91"
                AND `data`.`locationtype` = "1" )
            OR ( `data`.`locationlevel4` = "91"
                AND `data`.`locationtype` = "2" ) ) )
WHERE catchkeys IS VALUED
    AND ARRAY_LENGTH(catchkeys) > 0
GROUP BY year
ORDER BY year DESC

in 270ms

If you apply the WHERE with the correct case (:face_with_open_eyes_and_hand_over_mouth:, sorry ) is the performance similar?

As per my reply at the same time - yes.

And the performance of the original persists at ~2 seconds?

Yes, 1.9s as of right now