Performance of ORDER BY field with out an index

I have the following document data model named tvSessions:

“id”: “MEMCW:118:E81863521C47:2020-10-20”

{
    "propCode": "MEMCW",
    "roomNumber": "118",
    "macAddress": "E81863521C47",
    "date": "2020-10-20",
    "source": "iptv",
    "lastReportedTime": 1603224693,
    "channels": [
        {
            "channelName": "TBS",
            "channelId": 22,
            "sessions": [
                {
                    "isOccupied": 0,
                    "created": 1603224693,
                    "startTime": 1603224693,
                    "stopTime": 1603224693,
                    "duration": 665
                },
                {
                    "isOccupied": 0,
                    "created": 1603224693,
                    "startTime": 1603224693,
                    "stopTime": 1603224693,
                    "duration": 611
                }
            ]
        },
        {
            "channelName": "NBC",
            "channelId": 28,
            "sessions": [
                {
                    "isOccupied": 0,
                    "created": 1603224693,
                    "startTime": 1603224693,
                    "stopTime": 1603224693,
                    "duration": 665
                },
                {
                    "isOccupied": 0,
                    "created": 1603224693,
                    "startTime": 1603224693,
                    "stopTime": 1603224693,
                    "duration": 611
                }
            ]
        }
    ]
}

I have the following indexes and I can not change or add any.

CREATE INDEX index_tvSessions_date ON connectedroom.ecmp.tvSessions(date);
CREATE INDEX index_tvSessions_source ON connectedroom.ecmp.tvSessions(source);
CREATE INDEX index_tvSessions_composite ON connectedroom.ecmp.tvSessions(propCode, roomNumber, macAddress);

My query:

SELECT document.*,
       (
           SELECT c.*,
                  (
                      SELECT s.*
                      FROM c.sessions AS s
                      WHERE s.duration >= $minDuration
                          AND s.isOccupied = 1 ) AS sessions
           FROM document.channels AS c
           WHERE ($channelName IS NULL
                   OR c.channelName = $channelName)
               AND ($channelId IS NULL
                   OR c.channelId = $channelId) ) AS channels
       FROM tvSessions AS document
       WHERE document.date BETWEEN $startDate AND $endDate
           AND document.source = "iptv"
           AND ($propCode IS NULL
               OR document.propCode = $propCode)
           AND ($roomNumber IS NULL
               OR document.roomNumber = $roomNumber)
           AND ($macAddress IS NULL
               OR document.macAddress = $macAddress)
           AND ($channelName IS NULL
               OR ANY c IN document.channels SATISFIES c.channelName = $channelName
               AND ($channelId IS NULL
                   OR c.channelId = $channelId)
               AND (ANY s IN c.sessions SATISFIES s.duration >= $minDuration
                   AND s.isOccupied = 1 END) END)
       ORDER BY document.date DESC,
                document.lastReportedTime DESC;

My query is not covered but this is strictly for analytics. I’m using this with adhoc(true) to let the query planner build it and not cache the query based on this discussion:

  1. I believe the default ordering of the date index is ASC. When using DESC, is the comparator flipped resulting in the same performance?

  2. The date field has an index, which is already sorted, but the format does not have enough entropy and results in multiple documents with the same date. Since lastReportedTime is a Long in seconds, I break the same values with lastReportedTime DESC. Since lastReportedTime does not have an index, from my understanding, a Quicksort is being applied to the results?

Attached is the EXPLAIN if it adds any value.

  1. Yes the default order is ASC. Are you asking if the index is used for sorting? If so, no, index ordering has to match the ORDER BY for it to be used. (MB-19917) But then your query with the given indices - using the RBO - would never use an index for ordering as it has to combine the indices to filter results. (Index order doesn’t affect its ability to be used for filtering.)

If you were to have an index such as:

CREATE INDEX index_tvSessions_date_lrt_desc 
ON connectedroom.ecmp.tvSessions
(
    date DESC
   ,lastReportedTime DESC
)
WHERE source = "iptv"
;

it would provide necessary order & filtering to be used, removing need for the ORDER operator from the resulting plan. (Other filters would be applied to the document since the FETCH is required.)

  1. Both fields are being sorted in memory in the ORDER operator you can see in your plan, after the projection. Yes, a Quicksort is the basic algorithm. (https://github.com/couchbase/query/blob/master/sort/sort.go)
    If the fields overlap completely - the example is showing epoch seconds that does overlap - it is unnecessary to include date in the ORDER BY, of course.

HTH.

1 Like

If I had a simple query like:

SELECT doucment.*
FROM tvSessions AS document
WHERE document.date BETWEEN $startDate AND $endDate;
  1. Since I have an index on date, the index ordering would be used in the output?

If we do:

SELECT doucment.*
FROM tvSessions AS document
WHERE document.date BETWEEN $startDate AND $endDate
ORDER BY document.date DESC;
  1. Referencing MB-19917, the index ordering would not be used here because the DESC is not specified in the index creation? Would this fall back to the worst case O(n^2) for Quicksort?

I see what you mean by overlapping. The logic in the application makes it possible for the lastReportedTime (epoch Long seconds) field to be a different day than the date field because it gets updated when items are updated/inserted into the collections. If it turns out to not be a probable/actual occurrence, I could just use the lastReportedTime field as you mention.

  1. I can’t change the indexes but I see the value of the one you provided. With the current indexes, plan and ORDER BY DESC, Quicksort will always be applied to the results after the projection?

CBO and RBO

  1. No, not without the ORDER BY. When you have not specified an order so keys (and therefore documents) may be in index order but parallel processing (particularly of the FETCH required for the SELECT *) may alter this.

If you revise the statement to provide an ORDER BY which matches the index definition, it typically will be able to push this operation down to the index. (Was this what you were asking? - sorry if I’ve misinterpreted.)

(Check the EXPLAIN plan always to see what is actually going to take place.)

  1. It would sort in memory in the Query using an “Order” operator. It would not fail.

  2. Yes, an “Order” operator will always be present in the plan to sort (we could technically change the sorting algorithm if we chose to) as you see now. (CBO could potentially use costing and other existing indices to come up with a different access plan, but so long as the plan is intersecting multiple indexes, an “Order” operator will take care of the ordering.

HTH.

1 Like
  1. I understand. With asynchronous behavior, the results can only be deterministic if and only if you explicitly specify with ORDER BY.

With an index only on date ASC, when I use the simple query above with ORDER BY date, the EXPLAIN shows project <- filter <- fetch <- indexScan. When I use ORDER BY date DESC, I see order <- project <- filter <- fetch <- indexScan.

  1. Since the index on date is ASC, when we ORDER BY date DESC, will it be conducting a reverse sort in memory? I’m not sure what you mean when you say: “It would not fail”?
  1. Yes, that’s the plan you’ll see typically - you may have variations depending on what you select, what indices exists to satisfy a query etc. If you look at the EXPLAIN detail, in the case where ordering is performed by the index you’ll see the indication in the IndexScan operator.

  2. Yes, when an index cannot provide the order specified in the statement (e.g. it is ASC when the statement wants DESC, or if it doesn’t include all the fields used for sorting in the same order) the ORDER BY specification in the statement is still adhered to (it is in the SQL standard) and sorting is performed in memory. (Up coming releases will include the ability to spill this sorting activity to disk in some circumstances.)

Sorry if I have confused with the “It would not fail” statement; I specifically wanted to highlight that no matter what index is or isn’t available relative to the ORDER BY specification, the ORDER BY specification is adhered to. It is never omitted, or partially implemented or the statement rejected because an index to support it doesn’t exist. (Again, it is a required part of SQL/SQL++.)

HTH.

1 Like

Thanks for your time and quick responses.

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.