Slow query with unnest - though using index

Following this nice advice from @vsr1: How do I create an index to cope with UNNEST'ed queries? I ended up having a query that would find the last edited or created document of a specific type:

SELECT
       IFMISSINGORNULL(du.modified, d.revisioninfo.created) AS modified,
       IFMISSINGORNULL(du.modifiedby, d.revisioninfo.createdby) AS modifiedby,
       d.revisioninfo.modifiedcount
FROM data AS d
LEFT UNNEST TO_ARRAY(d.revisioninfo.updates) AS du
WHERE d.type='FishingTrip'
ORDER BY modified DESC limit 1;

It uses this index:

CREATE INDEX `def_type_modified` ON `data`(`type`,(`revisioninfo`.`modifiedcount`),(`revisioninfo`.`created`),(`revisioninfo`.`createdby`),(`revisioninfo`.`updates`))

as one can see from the Explain:

{
  "plan": {
    "#operator": "Sequence",
    "~children": [
      {
        "#operator": "Sequence",
        "~children": [
          {
            "#operator": "IndexScan2",
            "covers": [
              "cover ((`d`.`type`))",
              "cover (((`d`.`revisioninfo`).`modifiedcount`))",
              "cover (((`d`.`revisioninfo`).`created`))",
              "cover (((`d`.`revisioninfo`).`createdby`))",
              "cover (((`d`.`revisioninfo`).`updates`))",
              "cover ((meta(`d`).`id`))"
            ],
            "index": "def_type_modified",
            "index_id": "587679f61576574c",
            "index_projection": {
              "entry_keys": [
                0,
                1,
                2,
                3,
                4
              ]
            },
            "keyspace": "data",
            "namespace": "default",
            "spans": [
              {
                "exact": true,
                "range": [
                  {
                    "high": "\"FishingTrip\"",
                    "inclusion": 3,
                    "low": "\"FishingTrip\""
                  }
                ]
              }
            ],
            "using": "gsi"
          },
          {
            "#operator": "Parallel",
            "~child": {
              "#operator": "Sequence",
              "~children": [
                {
                  "#operator": "Unnest",
                  "as": "du",
                  "expr": "to_array(cover (((`d`.`revisioninfo`).`updates`)))",
                  "outer": true
                }
              ]
            }
          },
          {
            "#operator": "Parallel",
            "~child": {
              "#operator": "Sequence",
              "~children": [
                {
                  "#operator": "Filter",
                  "condition": "(cover ((`d`.`type`)) = \"FishingTrip\")"
                },
                {
                  "#operator": "InitialProject",
                  "result_terms": [
                    {
                      "as": "modified",
                      "expr": "ifmissingornull((`du`.`modified`), cover (((`d`.`revisioninfo`).`created`)))"
                    },
                    {
                      "as": "modifiedby",
                      "expr": "ifmissingornull((`du`.`modifiedby`), cover (((`d`.`revisioninfo`).`createdby`)))"
                    },
                    {
                      "expr": "cover (((`d`.`revisioninfo`).`modifiedcount`))"
                    }
                  ]
                }
              ]
            }
          }
        ]
      },
      {
        "#operator": "Order",
        "limit": "1",
        "sort_terms": [
          {
            "desc": true,
            "expr": "`modified`"
          }
        ]
      },
      {
        "#operator": "Limit",
        "expr": "1"
      },
      {
        "#operator": "FinalProject"
      }
    ]
  },
  "text": "SELECT\n       IFMISSINGORNULL(du.modified, d.revisioninfo.created) AS modified,\n       IFMISSINGORNULL(du.modifiedby, d.revisioninfo.createdby) AS modifiedby,\n       d.revisioninfo.modifiedcount\nFROM data AS d\nLEFT UNNEST TO_ARRAY(d.revisioninfo.updates) AS du\nWHERE d.type='FishingTrip'\nORDER BY modified DESC limit 1;"
}

However, now I have started playing with “live” data - and the above query takes around 1 second when I have 25-30,000 documents.

I’m suspecting that the ORDER BY modified could be the culprit. But I’m not sure - only guessing as it is not directly part of the index. If I remove the ORDER BY... then the response is in milliseconds!

I have tried to create the index like this (from pure trying to “unfold” the query):

CREATE INDEX `def_type_modified` ON `data`(`type`,(`revisioninfo`.`modifiedcount`),(`revisioninfo`.`created`),(`revisioninfo`.`createdby`),(`revisioninfo`.`updates`),IFMISSINGORNULL(TO_ARRAY(d.revisioninfo.updates).modified, d.revisioninfo.created))

But it does not speed up the query…

How can I improve this?

Pagination queries (ORDER BY … OFFSET … LIMIT) needs to produce all qualifying rows. That is why it takes time because last document that may be right candidate. Once you remove ORDER BY you want any 1 document. Once it produces first document it query stops.

We optimize queries if the query order by can expose index order. In this case it is not possible.

:frowning:

Ok, that is kind of sad…

So if I understand you correctly then the only way I can do this is by “unnesting” (or rather “calculating”) a “last modified date” in my program so that it can be included in the index?

Though, that kind of spoils the idea of using the “right” or “natural” structure of this information…

I am not sure i get it. Your query is not doing modified date for each document. It also going across documents and finding latest.

The following looks complex but you can try if interested.

CREATE INDEX ix21 ON default (type, ARRAY_MAX(ARRAY [IFMISSINGORNULL(u.modified, revisioninfo.created), IFMISSINGORNULL(u.modifiedby, revisioninfo.createdby)] FOR u IN TO_ARRAY(revisioninfo.updates) END) DESC, revisioninfo.modifiedcount);

EXPLAIN SELECT
      m[0] AS modified,
      m[1] AS modifiedby,
      d.revisioninfo.modifiedcount
FROM default AS d
LET m = ARRAY_MAX(ARRAY [IFMISSINGORNULL(u.modified, revisioninfo.created), IFMISSINGORNULL(u.modifiedby, revisioninfo.createdby)] FOR u IN TO_ARRAY(revisioninfo.updates) END)
WHERE d.type='FishingTrip'
ORDER BY ARRAY_MAX(ARRAY [IFMISSINGORNULL(u.modified, revisioninfo.created), IFMISSINGORNULL(u.modifiedby, revisioninfo.createdby)] FOR u IN TO_ARRAY(revisioninfo.updates) END) DESC limit 1;

Interesting… That was kind of the idea I followed in my “helpless” try - I just didn’t know how to express it :wink:

I’ll give your suggestion a go :+1:

To Use Expressions from index keys those needs to be exactly matched with query expressions.

That is AWESOME!!!

It WORKS. Response time is around 5-10ms.

Not sure if I can “invent” that for another case - but I believe I fully understand it.

Thanks! :pray::sunglasses:

Unfortunately, there is an issue with this brilliant suggestion. If I have a group of documents that have never been edited then it does not work. Like this type:

{
  "address": "Nørregade 10",
  "city": "København K",
  "inactive": false,
  "ispublic": true,
  "key": "1",
  "name": "Københavns Universitet",
  "revisioninfo": {
    "created": "2013-11-11T22:50:07+0100",
    "createdby": "John Dalsgaard/D-D",
    "modifiedcount": 0
  },
  "type": "University",
  "zip": "1017"
}

Running the query:

SELECT
      m[0] AS modified,
      m[1] AS modifiedby,
      d.revisioninfo.modifiedcount
FROM data AS d
LET m = ARRAY_MAX(ARRAY [IFMISSINGORNULL(u.modified, revisioninfo.created), IFMISSINGORNULL(u.modifiedby, revisioninfo.createdby)] FOR u IN TO_ARRAY(revisioninfo.updates) END)
WHERE d.type='University'
ORDER BY ARRAY_MAX(ARRAY [IFMISSINGORNULL(u.modified, revisioninfo.created), IFMISSINGORNULL(u.modifiedby, revisioninfo.createdby)] FOR u IN TO_ARRAY(revisioninfo.updates) END) DESC
limit 1;

just returns:

[
  {
    "modifiedcount": 0
  }
]

I think the problem is that there is no revisionsinfo.updates at all - and therefore the IFMISSINGORNULL... does not get executed. But as the ARRAY_MAX() is there I think I just need to always add the created and createdby

I have tried the following without success, so I may need a little more help @vsr1:

SELECT
      m[0] AS modified,
      m[1] AS modifiedby,
      d.revisioninfo.modifiedcount
FROM data AS d
LET m = ARRAY_MAX(ARRAY [IFMISSINGORNULL(u.modified, d.revisioninfo.created), IFMISSINGORNULL(u.modifiedby, d.revisioninfo.createdby)] FOR u IN TO_ARRAY(revisioninfo.updates) END)
WHERE d.type='University'
ORDER BY ARRAY_MAX(ARRAY [IFMISSINGORNULL(u.modified, d.revisioninfo.created), IFMISSINGORNULL(u.modifiedby, d.revisioninfo.createdby)] FOR u IN TO_ARRAY(revisioninfo.updates) END) DESC
limit 1;

… just thinking that it could not find the right reference - but that’s not it. Then I tried to add the created info as an element to the array:

SELECT
      m[0] AS modified,
      m[1] AS modifiedby,
      d.revisioninfo.modifiedcount
FROM data AS d
LET m = ARRAY_MAX(ARRAY_APPEND([revisioninfo.created,revisioninfo.createdby], ARRAY [IFMISSINGORNULL(u.modified, revisioninfo.created), IFMISSINGORNULL(u.modifiedby, revisioninfo.createdby)] FOR u IN TO_ARRAY(revisioninfo.updates) END))
WHERE d.type='University'
ORDER BY ARRAY_MAX(ARRAY_APPEND([revisioninfo.created,revisioninfo.createdby], ARRAY [IFMISSINGORNULL(u.modified, revisioninfo.created), IFMISSINGORNULL(u.modifiedby, revisioninfo.createdby)] FOR u IN TO_ARRAY(revisioninfo.updates) END)) DESC
limit 1;

Try following in post#4 query

TO_ARRAY(revisioninfo.updates)

To

TO_ARRAY(IFMISSINGORNULL(revisioninfo.updates,revisioninfo))

I had to make a slight change where i changed “revisioninfo.created” to “u.created” - and then change the index accordingly

It also returns more correct results from the existing queries as the wrong date was returned if there was one created after last modification :+1:

It gets a little complicated - but it works :wink:

Thanks!