JOINING data from array

HI,
I was trying to select data using an UNNEST query but it was taking far too long to return so I resorted to using ARRAY operations in an attempt to limit how much data was being joined. Now I have a query that returns an array (from a union of selects) in acceptable time, and my array has two kinds of data objects in it, prediction objects and observation objects. I want to combine a prediction with an observation based on a matching avtime field. Predictions and observations that have no peer, based on avtime, should be excluded. It seems that this should be simple but it eludes me. I include my query here, and a sample of data that has a few matches, 1614805200, 1614812400 and 1614816000. In production I’ll be querying many thousands or hundreds of thousands of records.
This is my query…

SELECT observation, CEIL(3600*FLOOR(((mdata.fcstValidEpoch)+3600/2)/3600)) avtime
FROM mdata
LET observation = (ARRAY {"station":station.Ceiling,"name":station.name} FOR station IN mdata.data WHEN station.name IN ["KBYG","KGEY","KSHR"] END),
observationLength = ARRAY_LENGTH(observation)
WHERE
    type='DD'
    AND docType='obs'
    AND subset='METAR'
    AND version='V01'
    AND fcstValidEpoch >= 1614805400 - 900
    AND fcstValidEpoch <= 1615401000 + 900
    AND observationLength = 3
UNION
SELECT prediction, CEIL(3600*FLOOR(((mdata.fcstValidEpoch)+3600/2)/3600)) avtime
FROM mdata
LET prediction = (ARRAY {"station":station.Ceiling,"name":station.name} FOR station IN OBJECT_VALUES(mdata.data) WHEN station.name IN ["KBYG","KGEY","KSHR"] END),
predictionLength = ARRAY_LENGTH(prediction)
WHERE
    type='DD'
    AND docType='model'
    AND model="HRRR"
    AND fcstLen=6
    AND subset='METAR'
    AND version='V01'
    AND fcstValidEpoch >= 1614805400 - 900
    AND fcstValidEpoch <= 1615401000 + 900
    AND predictionLength = 3
LIMIT 10

How can I rewrite this query, or add to it, such that I still retain performance and I would get back data like this…

[
  {
    "avtime": 1614805200,
    "prediction": [
      {
        "name": "KBYG",
        "station": 6084
      },
      {
        "name": "KGEY",
        "station": 6162
      },
      {
        "name": "KSHR",
        "station": 6175
      }
    ],
    "observation": [
      {
        "name": "KBYG",
        "station": 6000
      },
      {
        "name": "KSHR",
        "station": 6000
      },
      {
        "name": "KGEY",
        "station": 6000
      }
    ]
  },
  {
    "avtime": 1614812400,
    "observation": [
      {
        "name": "KBYG",
        "station": 6000
      },
      {
        "name": "KSHR",
        "station": 6000
      },
      {
        "name": "KGEY",
        "station": 6000
      }
    ],
    "prediction": [
      {
        "name": "KBYG",
        "station": 6084
      },
      {
        "name": "KGEY",
        "station": 6162
      },
      {
        "name": "KSHR",
        "station": 6176
      }
    ]
  },
  {
    "avtime": 1614816000,
    "observation": [
      {
        "name": "KBYG",
        "station": 6000
      },
      {
        "name": "KSHR",
        "station": 6000
      },
      {
        "name": "KGEY",
        "station": 6000
      }
    ],
    "prediction": [
      {
        "name": "KBYG",
        "station": 6084
      },
      {
        "name": "KGEY",
        "station": 6162
      },
      {
        "name": "KSHR",
        "station": 6176
      }
    ]
  }
]

From this data which is returned currently…

[
  {
    "avtime": 1614805200,
    "prediction": [
      {
        "name": "KBYG",
        "station": 6084
      },
      {
        "name": "KGEY",
        "station": 6162
      },
      {
        "name": "KSHR",
        "station": 6175
      }
    ]
  },
  {
    "avtime": 1614805200,
    "observation": [
      {
        "name": "KBYG",
        "station": 6000
      },
      {
        "name": "KSHR",
        "station": 6000
      },
      {
        "name": "KGEY",
        "station": 6000
      }
    ]
  },
  {
    "avtime": 1614808800,
    "prediction": [
      {
        "name": "KBYG",
        "station": 6084
      },
      {
        "name": "KGEY",
        "station": 6162
      },
      {
        "name": "KSHR",
        "station": 6175
      }
    ]
  },
  {
    "avtime": 1614812400,
    "observation": [
      {
        "name": "KBYG",
        "station": 6000
      },
      {
        "name": "KSHR",
        "station": 6000
      },
      {
        "name": "KGEY",
        "station": 6000
      }
    ]
  },
  {
    "avtime": 1614812400,
    "prediction": [
      {
        "name": "KBYG",
        "station": 6084
      },
      {
        "name": "KGEY",
        "station": 6162
      },
      {
        "name": "KSHR",
        "station": 6176
      }
    ]
  },
  {
    "avtime": 1614816000,
    "observation": [
      {
        "name": "KBYG",
        "station": 6000
      },
      {
        "name": "KSHR",
        "station": 6000
      },
      {
        "name": "KGEY",
        "station": 6000
      }
    ]
  },
  {
    "avtime": 1614816000,
    "prediction": [
      {
        "name": "KBYG",
        "station": 6084
      },
      {
        "name": "KGEY",
        "station": 6162
      },
      {
        "name": "KSHR",
        "station": 6176
      }
    ]
  },
  {
    "avtime": 1614819600,
    "observation": [
      {
        "name": "KBYG",
        "station": 6000
      },
      {
        "name": "KSHR",
        "station": 6000
      },
      {
        "name": "KGEY",
        "station": 6000
      }
    ]
  },
  {
    "avtime": 1614996000,
    "prediction": [
      {
        "name": "KBYG",
        "station": 6084
      },
      {
        "name": "KGEY",
        "station": 6162
      },
      {
        "name": "KSHR",
        "station": 6175
      }
    ]
  },
  {
    "avtime": 1614823200,
    "observation": [
      {
        "name": "KBYG",
        "station": 6000
      },
      {
        "name": "KSHR",
        "station": 6000
      },
      {
        "name": "KGEY",
        "station": 6000
      }
    ]
  }
]

Any help is greatly appreciated.
thanks

SELECT observation, prediction, CEIL(3600*FLOOR(((o.fcstValidEpoch)+3600/2)/3600)) avtime
FROM mdata AS o
JOIN mdata AS m ON CEIL(3600*FLOOR(((o.fcstValidEpoch)+3600/2)/3600)) = CEIL(3600*FLOOR(((m.fcstValidEpoch)+3600/2)/3600))
LET observation = (ARRAY {"station":station.Ceiling,"name":station.name} FOR station IN o.data WHEN station.name IN ["KBYG","KGEY","KSHR"] END),
prediction = (ARRAY {"station":station.Ceiling,"name":station.name} FOR station IN OBJECT_VALUES(m.data) WHEN station.name IN ["KBYG","KGEY","KSHR"] END)
WHERE o.type='DD'
    AND o.docType='obs'
    AND o.subset='METAR'
    AND o.version='V01'
    AND o.fcstValidEpoch >= 1614805400 - 900
    AND o.fcstValidEpoch <= 1615401000 + 900
    AND ARRAY_LENGTH(observation) = 3
    AND m.type='DD'
    AND m.docType='model'
    AND m.subset='METAR'
    AND m.version='V01'
    AND m.model="HRRR"
    AND m.fcstLen=6
    AND m.fcstValidEpoch >= 1614805400 - 900
    AND m.fcstValidEpoch <= 1615401000 + 900
    AND ARRAY_LENGTH(prediction) = 3;

CREATE INDEX ix1 ON mdata(subset, version, fcstValidEpoch) WHERE type = "DD" AND docType='obs';
CREATE INDEX ix2 ON mdata(subset, version, model, fcstLen, fcstValidEpoch, CEIL(3600*FLOOR(((fcstValidEpoch)+3600/2)/3600))) WHERE type = "DD" AND docType='model';

Additional options to consider .
HASH JOIN (EE),
Switch Join Order (might need to change the index keys)

If using NL use the following index

CREATE INDEX ix2 ON mdata(subset, version, model, fcstLen, CEIL(3600*FLOOR(((fcstValidEpoch)+3600/2)/3600)), fcstValidEpoch ) WHERE type = "DD" AND docType='model';

https://index-advisor.couchbase.com/indexadvisor/#1

Is some thing incorrect here (BJECT_VALUES) prediction = (ARRAY {“station”:station.Ceiling,“name”:station.name} FOR station IN OBJECT_VALUES(mdata.data) WHEN station.name IN [“KBYG”,“KGEY”,“KSHR”] END)

Another thing you can consider ARRAY index (depends on data index might grow big).

CREATE INDEX ix1 ON mdata(subset, version, DISTINCT ARRAY v.name FOR v IN data END, fcstValidEpoch) WHERE type = "DD" AND docType='obs';
CREATE INDEX ix2 ON mdata(subset, version, model, fcstLen, DISTINCT ARRAY v.name FOR v IN data END, fcstValidEpoch, CEIL(3600*FLOOR(((fcstValidEpoch)+3600/2)/3600))) WHERE type = "DD" AND docType='model';

With following additional predicates.
AND ANY v IN o.data SATISIFIES v.name IN ["KBYG","KGEY","KSHR"] END
AND ANY v1 IN m.data SATISIFIES v1.name IN ["KBYG","KGEY","KSHR"] END

Thank you so much! I owe you beer! (or beverage of choice)

I used the adviser (as well as your array index suggestion) and created the array indexes. The “LET” is still the “long pole in the tent” but it now happens amazingly fast. I believe I can afford the index memory although I realize that I need to keep an eye on its size. This is simply amazing. Thank you very much. There is a small possibility that I may get competent at this yet. :wink:

Which ever works for you. Advisor only advise based on query it will not rewrite query.

LET clause should be okay. In your case you are looking in the LET clause all 3 elements must present in data.
ARRAY index might work better with ANY clause. ANY clause use Array indexScan that matches any one value present. After that it post indexScan it matches all 3 values. With out array index it might fetch the document even array doesn’t have the station (this might be too much data fetch in your case). Experiment various things and see which works better.

If using EE consider PARTITION index too.

I’ll do those things! Thanks again!