N1QL Join to array within a document

Thanks for the pointer. We are checking to see if there is a bug or issue with covering indexes and LET. Here is the query without the LET. Can you try this out.

CREATE INDEX idx_account_customer_xyz_transDate
ON acctbucket(SUBSTR(transDate,0,10),code)
WHERE code != "" AND meta().id LIKE "account-customer_xyz%";

SELECT SUBSTR(account.transDate,0,10) As transDate, AVG(FIRST c.weight for c IN codesDoc.codes WHEN c.msDrg = account.code END) As avgWeight
FROM acctbucket account
JOIN acctbucket codesDoc ON KEYS "codes-version-9"
WHERE account.code != "" AND meta(account).id LIKE "account-customer_xyz%"
AND SUBSTR(account.transDate,0,10) >= "2016-07-01" AND SUBSTR(account.transDate,0,10) < "2016-07-03"
GROUP BY SUBSTR(account.transDate,0,10);

It turns out you found a bug. Not with LET, but with FIRST and similar operators.

JIRA ticket at https://issues.couchbase.com/browse/MB-20492

Thanks,
Gerald

Wow, that makes me feel better. We thought we were losing out marbles over here :slight_smile: Is the new query you wrote with the FOR statement in the SELECT block still susceptible the bug?

Also, the new query is a little confusing in that the FIRST / FOR / IN / WHEN clause doesn’t make reference to the codesDoc.weight property, so how will it know what properties in the result set to average?

Oops, I fixed my post to include weight.

The new query is susceptible to the bug, because the bug is not in LET. It is fixed now, will ship in 4.5.1.

Ok, that change makes sense (and also sheds a little more light on the FIRST / FOR / IN / WHEN syntax). Do you have a rough idea of when 4.5.1 will ship?

4.5.1 should GA around end of September, but there should be a Beta build available before that.

Thanks geraldss, one last question on this. After reviewing our document models, we make heavy use of nested documents in arrays that pointed out the bug you found, example:

Document Key: financial-customer_xyz-19800001
{
  "acctFinancials": [
{
  "acctNumber": "19800001",
  "transactionId": "164126601",
  "transactionType": "Credit Adjustment",
  "financialCode": "3010",
  "extended": {
    "service_date": "09/15/2015 00:00:00"
  }
},
{
  "acctNumber": "19800001",
  "transactionId": "162555860",
  "transactionType": "Credit Adjustment",
  "financialCode": "3010",
  "extended": {
    "service_date": "09/15/2015 00:00:00"
  }
}
  ],
  "tenantName": "customer_xyz",
  "acctNumber": "19800001",
  "schemaVer": "1.1.0.0"
}

As you can see, “acctNumber” is our unique ID, and our “main” documents are of course the “account” document, e.g. keys named account-customer_xyz-########.

If our end goal is to JOIN from the account- documents, to other nested documents (like the example above) what’s the best way to index these operations? Is there a way to join on a partial portion of the key since we’ve embedded the account number in all documents, or will we need to index the actual “acctNumber” attribute?

Hi @jeffhoward001,

If you post an example of the two documents you want to JOIN, along with their respective keys, and an example of the desired result, we can take a look. You can JOIN or NEST using computed expressions, so using a portion of a key or other string should be fine. But an example would be helpful to confirm.

If the account number is unique and other portions of the key is the document type string (“account_customer_xyz” or something else), you can construct that at runtime.

Simple example:

SELECT *
FROM child c  INNER JOIN parent p
             ON KEYS ("parent-doc::" || substr(c.id, 8, 5) || "::subset(c.pid, 10,5));

Interesting… Here are our key naming conventions. The “123456789” is the unique account #, of which there are thousands per client. So if customer_xyz has 10,000 accounts, we would have 10,000 account- documents, 10,000 financial- documents, and 10,000 charges- documents, all linked together by the account # at the end of the key. Hopefully that makes sense.

account-client_name-123456789
financial-client_name-123456789
charges-client_name-123456789

Here are some short examples of what the documents structure looks like:

account-customer_xyz-123456789
{
  "acctNumber": 123456789,
  "tenantName": "customer_xyz",
  "code": "166"
}

financial-customer_xyz-123456789
{
  "acctFinancials": [
    {
      "acctNumber": "19800001",
      "transactionId": "164126601",
      "transactionType": "Credit Adjustment",
      "amount": 30.10
    },
    {
      "acctNumber": "19800001",
      "transactionId": "162555860",
      "transactionType": "Credit Adjustment",
      "amount": 255.14
	}
  ],
  "tenantName": "customer_xyz",
  "acctNumber": "123456789"
}

charges-customer_xyz-123456789
{
  "acctCharges": [
    {
      "acctNumber": "123456789",
      "chargeId": "165532430",
      "amount": 1193.64
	},
	{
      "acctNumber": "123456789",
      "chargeId": "165247278",
      "amount": 1461.09
	}
  ],
  "tenantName": "customer_xyz",
  "acctNumber": "123456789"
}

Our end-goal in most circumstances will be to join the account doc to the detailed docs (financial or charges), then aggregate on a value within the nested financials or charges. So for a simple example, if I wanted to show total charges for all accounts I would need to join the “account-” docs to the “charges-” docs, doing a SUM() on the nested amounts inside the charges document.

Then if I want limit the results to only one tenant, would it be the most efficient to limit by providing only partial key names? E.g.:

FROM acctbucket accounts JOIN accountbucket charges
             ON KEYS ("account-customer_xyz-" || "charges-customer_xyz-");

To restrict it to ONLY ONE account

FROM acctbucket accounts USE KEYS ["account-customer_xyz-123456789" 
          JOIN accountbucket charges
             ON KEYS ("charges-customer_xyz-" || TOSTRING(accounts.acctNumber);

This is LOGICALLY (syntax is unsupported).

Please see: https://dzone.com/articles/join-faster-with-couchbase-index-joins

FROM acctbucket accounts USE KEYS ["account-customer_xyz-123456789" 
          JOIN accountbucket charges
 ON KEYS ("charges-customer_xyz-" || TOSTRING(accounts.acctNumber) = META(charges).id;

Sorry, I misunderstood the concatenation operator for a logical “or”. I’m having trouble getting this to work.

SELECT *
FROM acctbucket accounts USE KEYS "account-customer_xyz-123456789"
JOIN acctbucket financials ON KEYS "financials-customer_xyz-" || TOSTRING(accounts.acctNumber) = META(financials).id

It’s throwing an error saying “Ambiguous reference to field financials”, however I only have the two aliases, account and financials.

This is the actual syntax you should use. The key (“charges-customer_xyz-” || TOSTRING(accounts.acctNumber) is implictly joined with document key of charges (meta(charges).id).

FROM acctbucket accounts USE KEYS ["account-customer_xyz-123456789" 
   JOIN accountbucket charges
  ON KEYS ("charges-customer_xyz-" || TOSTRING(accounts.acctNumber);

Thanks keshav_m, that fixed it :slight_smile: Going back to the original question, is there a way (without specifying a single account number) to join between the account- and financial- docs to gain access to the nested elements within the financials doc (such as “amount”)?

To provide a simple example, we want to look at all accounts for customer_xyz with a code = 166, then join to the financials document, and SUM() the nested “amount” value, grouped by account number.

We’ve been able to successfully SUM() the financial amounts within the financials doc, but we have not been able to properly join it back to the account doc to allow for filtering by account level attributes (such as “code” for example).

account-customer_xyz-123456789
{
  "acctNumber": 123456789,
  "tenantName": "customer_xyz",
  "code": "166"
}

financial-customer_xyz-123456789
{
  "acctFinancials": [
    {
      "acctNumber": "123456789",
      "transactionId": "164126601",
      "transactionType": "Credit Adjustment",
      "amount": 30.10
    },
    {
      "acctNumber": "123456789",
      "transactionId": "162555860",
      "transactionType": "Credit Adjustment",
      "amount": 255.14
	}
  ],
  "tenantName": "customer_xyz",
  "acctNumber": "123456789"
}

If you don’t use the USE KEYS, the query will join ALL qualifying documents from accounts.
Because you have the field acctNumber in both accounts and chargers, you should have a type field for each document type (or change the field name).

FROM acctbucket accounts  
   JOIN accountbucket charges
  ON KEYS ("charges-customer_xyz-" || TOSTRING(accounts.acctNumber) 
 WHERE accounts.type = 'accounts' and charges.type = 'charges';

This is really cool. I applied your index suggestions and they do help tremendously.
Can you help me understand why in the explain the LET is taking so much time? This is my explain text.

{
  "#operator": "Sequence",
  "#stats": {
    "#phaseSwitches": 1,
    "execTime": "2.174µs"
  },
  "~children": [
    {
      "#operator": "Authorize",
      "#stats": {
        "#phaseSwitches": 3,
        "execTime": "11.743µs",
        "servTime": "1.016365ms"
      },
      "privileges": {
        "List": [
          {
            "Target": "default:mdata",
            "Priv": 7
          }
        ]
      },
      "~child": {
        "#operator": "Sequence",
        "#stats": {
          "#phaseSwitches": 1,
          "execTime": "8.696µs"
        },
        "~children": [
          {
            "#operator": "IntersectScan",
            "#stats": {
              "#itemsIn": 864,
              "#itemsOut": 583,
              "#phaseSwitches": 2903,
              "execTime": "1.313074ms",
              "kernTime": "21.211604ms"
            },
            "scans": [
              {
                "#operator": "IndexScan3",
                "#stats": {
                  "#itemsOut": 184,
                  "#phaseSwitches": 745,
                  "execTime": "492.314µs",
                  "kernTime": "242.398µs",
                  "servTime": "2.73331ms"
                },
                "as": "o",
                "index": "ix_subset_version_fcstValidEpoch_DD_obs",
                "index_id": "9a0d59e5e39e178d",
                "index_projection": {
                  "primary_key": true
                },
                "keyspace": "mdata",
                "namespace": "default",
                "spans": [
                  {
                    "exact": true,
                    "range": [
                      {
                        "high": "\"METAR\"",
                        "inclusion": 3,
                        "low": "\"METAR\""
                      },
                      {
                        "high": "\"V01\"",
                        "inclusion": 3,
                        "low": "\"V01\""
                      },
                      {
                        "high": "(1615401000 + 900)",
                        "inclusion": 3,
                        "low": "(1612805400 - 900)"
                      }
                    ]
                  }
                ],
                "using": "gsi",
                "#time_normal": "00:00.003",
                "#time_absolute": 0.003225624
              },
              {
                "#operator": "IndexScan3",
                "#stats": {
                  "#itemsOut": 97,
                  "#phaseSwitches": 393,
                  "execTime": "292.162µs",
                  "kernTime": "59.505µs",
                  "servTime": "3.105274ms"
                },
                "as": "o",
                "index": "idx_type_docType_version_fcstValidEpoch_METAR",
                "index_id": "acf9f11cc0879013",
                "index_projection": {
                  "primary_key": true
                },
                "keyspace": "mdata",
                "namespace": "default",
                "spans": [
                  {
                    "exact": true,
                    "range": [
                      {
                        "high": "\"DD\"",
                        "inclusion": 3,
                        "low": "\"DD\""
                      },
                      {
                        "high": "\"obs\"",
                        "inclusion": 3,
                        "low": "\"obs\""
                      },
                      {
                        "high": "\"V01\"",
                        "inclusion": 3,
                        "low": "\"V01\""
                      },
                      {
                        "high": "(1615401000 + 900)",
                        "inclusion": 3,
                        "low": "(1612805400 - 900)"
                      }
                    ]
                  }
                ],
                "using": "gsi",
                "#time_normal": "00:00.003",
                "#time_absolute": 0.0033974359999999998
              },
              {
                "#operator": "IndexScan3",
                "#stats": {
                  "#itemsOut": 583,
                  "#phaseSwitches": 2337,
                  "execTime": "1.178854ms",
                  "kernTime": "320.14µs",
                  "servTime": "1.8744ms"
                },
                "as": "o",
                "index": "adv_fcstValidEpoch_docType_subset_version_type",
                "index_id": "aab729c871fbf576",
                "index_projection": {
                  "primary_key": true
                },
                "keyspace": "mdata",
                "namespace": "default",
                "spans": [
                  {
                    "exact": true,
                    "range": [
                      {
                        "high": "(1615401000 + 900)",
                        "inclusion": 3,
                        "low": "(1612805400 - 900)"
                      }
                    ]
                  }
                ],
                "using": "gsi",
                "#time_normal": "00:00.003",
                "#time_absolute": 0.003053254
              }
            ],
            "#time_normal": "00:00.001",
            "#time_absolute": 0.001313074
          },
          {
            "#operator": "Fetch",
            "#stats": {
              "#itemsIn": 583,
              "#itemsOut": 583,
              "#phaseSwitches": 2409,
              "execTime": "4.256622ms",
              "kernTime": "2.441514033s",
              "servTime": "490.409745ms"
            },
            "as": "o",
            "keyspace": "mdata",
            "namespace": "default",
            "#time_normal": "00:00.494",
            "#time_absolute": 0.494666367
          },
          {
            "#operator": "Sequence",
            "#stats": {
              "#phaseSwitches": 1,
              "execTime": "8.395µs"
            },
            "~children": [
              {
                "#operator": "NestedLoopJoin",
                "#stats": {
                  "#itemsIn": 583,
                  "#itemsOut": 7682,
                  "#phaseSwitches": 34229,
                  "execTime": "289.664183ms",
                  "kernTime": "47.837926883s"
                },
                "alias": "m",
                "on_clause": "(ceil((3600 * floor((((`o`.`fcstValidEpoch`) + (3600 / 2)) / 3600)))) = ceil((3600 * floor((((`m`.`fcstValidEpoch`) + (3600 / 2)) / 3600)))))",
                "~child": {
                  "#operator": "Sequence",
                  "#stats": {
                    "#phaseSwitches": 1166,
                    "execTime": "39.422477307s",
                    "kernTime": "1.750126ms",
                    "state": "running"
                  },
                  "~children": [
                    {
                      "#operator": "IntersectScan",
                      "#stats": {
                        "#itemsIn": 3781568,
                        "#itemsOut": 7682,
                        "#phaseSwitches": 7583747,
                        "execTime": "5.114748981s",
                        "kernTime": "5.684166012s"
                      },
                      "scans": [
                        {
                          "#operator": "IndexScan3",
                          "#stats": {
                            "#itemsOut": 1906903,
                            "#phaseSwitches": 7631940,
                            "execTime": "5.033923583s",
                            "kernTime": "2.487471313s",
                            "servTime": "2.783055257s"
                          },
                          "as": "m",
                          "index": "idx_type_docType_version_fcstValidEpoch_METAR",
                          "index_id": "acf9f11cc0879013",
                          "index_projection": {
                            "primary_key": true
                          },
                          "keyspace": "mdata",
                          "namespace": "default",
                          "nested_loop": true,
                          "spans": [
                            {
                              "range": [
                                {
                                  "high": "\"DD\"",
                                  "inclusion": 3,
                                  "low": "\"DD\""
                                },
                                {
                                  "high": "\"model\"",
                                  "inclusion": 3,
                                  "low": "\"model\""
                                },
                                {
                                  "high": "\"V01\"",
                                  "inclusion": 3,
                                  "low": "\"V01\""
                                },
                                {
                                  "high": "(1615401000 + 900)",
                                  "inclusion": 3,
                                  "low": "(1612805400 - 900)"
                                }
                              ]
                            }
                          ],
                          "using": "gsi"
                        },
                        {
                          "#operator": "IndexScan3",
                          "#stats": {
                            "#itemsOut": 7682,
                            "#phaseSwitches": 33643,
                            "execTime": "41.189062ms",
                            "kernTime": "54.520114ms",
                            "servTime": "10.109341365s"
                          },
                          "as": "m",
                          "index": "ix_subset_version_model_fcstLen_fcstValidEpoch_DD_model",
                          "index_id": "968e2954d5368e50",
                          "index_projection": {
                            "primary_key": true
                          },
                          "keyspace": "mdata",
                          "namespace": "default",
                          "nested_loop": true,
                          "spans": [
                            {
                              "range": [
                                {
                                  "high": "\"METAR\"",
                                  "inclusion": 3,
                                  "low": "\"METAR\""
                                },
                                {
                                  "high": "\"V01\"",
                                  "inclusion": 3,
                                  "low": "\"V01\""
                                },
                                {
                                  "high": "\"HRRR\"",
                                  "inclusion": 3,
                                  "low": "\"HRRR\""
                                },
                                {
                                  "high": "6",
                                  "inclusion": 3,
                                  "low": "6"
                                },
                                {
                                  "high": "(1615401000 + 900)",
                                  "inclusion": 3,
                                  "low": "(1612805400 - 900)"
                                },
                                {
                                  "high": "ceil((3600 * floor((((`o`.`fcstValidEpoch`) + (3600 / 2)) / 3600))))",
                                  "inclusion": 3,
                                  "low": "ceil((3600 * floor((((`o`.`fcstValidEpoch`) + (3600 / 2)) / 3600))))"
                                }
                              ]
                            }
                          ],
                          "using": "gsi"
                        },
                        {
                          "#operator": "IndexScan3",
                          "#stats": {
                            "#itemsOut": 1866983,
                            "#phaseSwitches": 7472279,
                            "execTime": "5.179590472s",
                            "kernTime": "2.449477912s",
                            "servTime": "2.909344991s"
                          },
                          "as": "m",
                          "index": "adv_fcstLen_model_fcstValidEpoch_type_docType_subset_version",
                          "index_id": "44cc5b35c7c4e605",
                          "index_projection": {
                            "primary_key": true
                          },
                          "keyspace": "mdata",
                          "namespace": "default",
                          "nested_loop": true,
                          "spans": [
                            {
                              "range": [
                                {
                                  "high": "6",
                                  "inclusion": 3,
                                  "low": "6"
                                },
                                {
                                  "high": "\"HRRR\"",
                                  "inclusion": 3,
                                  "low": "\"HRRR\""
                                },
                                {
                                  "high": "(1615401000 + 900)",
                                  "inclusion": 3,
                                  "low": "(1612805400 - 900)"
                                }
                              ]
                            }
                          ],
                          "using": "gsi"
                        }
                      ]
                    },
                    {
                      "#operator": "Fetch",
                      "#stats": {
                        "#itemsIn": 7682,
                        "#itemsOut": 7682,
                        "#phaseSwitches": 33949,
                        "execTime": "39.682079ms",
                        "kernTime": "10.801279003s",
                        "servTime": "1.126032525s"
                      },
                      "as": "m",
                      "keyspace": "mdata",
                      "namespace": "default",
                      "nested_loop": true
                    }
                  ]
                },
                "#time_normal": "00:00.289",
                "#time_absolute": 0.28966418299999996
              },
              {
                "#operator": "Let",
                "#stats": {
                  "#itemsIn": 7682,
                  "#itemsOut": 7682,
                  "#phaseSwitches": 30731,
                  "execTime": "51.289474254s",
                  "kernTime": "108.835001ms"
                },
                "bindings": [
                  {
                    "expr": "array {\"name\": (`station`.`name`), \"station\": (`station`.`Ceiling`)} for `station` in (`o`.`data`) when ((`station`.`name`) in [\"KBYG\", \"KGEY\", \"KSHR\"]) end",
                    "var": "observation"
                  },
                  {
                    "expr": "array {\"name\": (`station`.`name`), \"station\": (`station`.`Ceiling`)} for `station` in object_values((`m`.`data`)) when ((`station`.`name`) in [\"KBYG\", \"KGEY\", \"KSHR\"]) end",
                    "var": "prediction"
                  }
                ],
                "#time_normal": "00:51.289",
                "#time_absolute": 51.289474254
              },
              {
                "#operator": "Filter",
                "#stats": {
                  "#itemsIn": 7682,
                  "#itemsOut": 343,
                  "#phaseSwitches": 16053,
                  "execTime": "508.293338ms",
                  "kernTime": "50.890165348s"
                },
                "condition": "(((((((((((((((((`o`.`type`) = \"DD\") and ((`o`.`docType`) = \"obs\")) and ((`o`.`subset`) = \"METAR\")) and ((`o`.`version`) = \"V01\")) and ((1612805400 - 900) <= (`o`.`fcstValidEpoch`))) and ((`o`.`fcstValidEpoch`) <= (1615401000 + 900))) and (array_length(`observation`) = 3)) and ((`m`.`type`) = \"DD\")) and ((`m`.`docType`) = \"model\")) and ((`m`.`subset`) = \"METAR\")) and ((`m`.`version`) = \"V01\")) and ((`m`.`model`) = \"HRRR\")) and ((`m`.`fcstLen`) = 6)) and ((1612805400 - 900) <= (`m`.`fcstValidEpoch`))) and ((`m`.`fcstValidEpoch`) <= (1615401000 + 900))) and (array_length(`prediction`) = 3))",
                "#time_normal": "00:00.508",
                "#time_absolute": 0.508293338
              },
              {
                "#operator": "InitialProject",
                "#stats": {
                  "#itemsIn": 343,
                  "#itemsOut": 343,
                  "#phaseSwitches": 1034,
                  "execTime": "6.317129ms",
                  "kernTime": "51.390252994s"
                },
                "result_terms": [
                  {
                    "expr": "`observation`"
                  },
                  {
                    "expr": "`prediction`"
                  },
                  {
                    "as": "avtime",
                    "expr": "ceil((3600 * floor((((`o`.`fcstValidEpoch`) + (3600 / 2)) / 3600))))"
                  }
                ],
                "#time_normal": "00:00.006",
                "#time_absolute": 0.006317129
              },
              {
                "#operator": "FinalProject",
                "#stats": {
                  "#itemsIn": 343,
                  "#itemsOut": 343,
                  "#phaseSwitches": 1030,
                  "execTime": "649.917µs",
                  "kernTime": "988.742µs"
                },
                "#time_normal": "00:00.000",
                "#time_absolute": 0.0006499170000000001
              }
            ],
            "#time_normal": "00:00.000",
            "#time_absolute": 0.000008395
          }
        ],
        "#time_normal": "00:00.000",
        "#time_absolute": 0.000008696
      },
      "#time_normal": "00:00.001",
      "#time_absolute": 0.001028108
    },
    {
      "#operator": "Stream",
      "#stats": {
        "#itemsIn": 343,
        "#itemsOut": 343,
        "#phaseSwitches": 689,
        "execTime": "35.423083ms",
        "kernTime": "51.364131276s"
      },
      "#time_normal": "00:00.035",
      "#time_absolute": 0.035423083
    }
  ],
  "~versions": [
    "6.5.0-N1QL",
    "6.6.2-9588-enterprise"
  ],
  "#time_normal": "00:00.000",
  "#time_absolute": 0.000002174
}

@randy.pierce ,

Your query using IntersectScans, You should avoid intersectScan (How can an Intersect Scan return more items than MIN(inputs from respective indices). It's returning the MAX) by dropping unnecessary index or specify USE INDEX clause
mdata AS o USE INDEX (ix_subset_version_model_fcstLen_fcstValidEpoch_DD_model)
JOIN mdata AS m USE INDEX (ix_subset_version_model_fcstLen_fcstValidEpoch_DD_model)

Second JOIN produced documents, it needs to walk thorough two arrays and construct new arrays,
Later it throwing away Filter time 95% of documents are eliminated might be due to length of the array check.
That why i suggested in other post use Array index.

Thank you again!
randy

Thank you so much for your help with this. I still struggle with the end result of what I am trying to do.
What I really want to end up with is a set of contingencies for each fcstValidEpoch. This represents the contingencies, and I want to sum the contingencies (hits, misses, false_alarms, and correct_negatives) for
each fcstValidTime because depending on the number and value of predictions and observations there
can be different results. Obviously the commented COUNT… lines in m y query are not right. How do I apply this formula to the lists of predictions and observations in each result? I included my query and a snippet of the results, but of course I do not have any contingencies as they are commented out. I’ve worked at this for days now, any help is GREATLY appreciated.
This is my incorrect attempt at counting contingencies, it is the right formula, it just isn’t the right N1QL…

--COUNT (CASE WHEN (prediction[*].station.Ceiling < 1000 AND observation[*].station.Ceiling < 1000) THEN 1 ELSE 0 END) hits,
--COUNT (CASE WHEN (prediction[*].station.Ceiling < 1000 AND NOT observation[*].station.Ceiling < 1000) THEN 1 ELSE 0 END) fals_alarms,
--COUNT (CASE WHEN (NOT prediction[*].station.Ceiling < 1000 AND observation[*].station.Ceiling < 1000) THEN 1 ELSE 0 END) misses,
--COUNT (CASE WHEN (NOT prediction[*].station.Ceiling < 1000 AND NOT observation[*].station.Ceiling < 1000) THEN 1 ELSE 0 END) correct_negatives


This is my current query…

SELECT
observation observations,
prediction predictions,
ceil(3600*floor(((m0.fcstValidEpoch)+3600/2)/3600)) avtime,
COUNT(DISTINCT m0.fcstValidEpoch) N_times,
MIN(m0.fcstValidEpoch) min_secs,
MAX(m0.fcstValidEpoch) max_secs,
ARRAY_LENGTH(prediction) AS N0
--COUNT (CASE WHEN (prediction[*].station.Ceiling < 1000 AND observation[*].station.Ceiling < 1000) THEN 1 ELSE 0 END) hits,
--COUNT (CASE WHEN (prediction[*].station.Ceiling < 1000 AND NOT observation[*].station.Ceiling < 1000) THEN 1 ELSE 0 END) fals_alarms,
--COUNT (CASE WHEN (NOT prediction[*].station.Ceiling < 1000 AND observation[*].station.Ceiling < 1000) THEN 1 ELSE 0 END) misses,
--COUNT (CASE WHEN (NOT prediction[*].station.Ceiling < 1000 AND NOT observation[*].station.Ceiling < 1000) THEN 1 ELSE 0 END) correct_negatives
FROM mdata AS o
JOIN mdata AS m0 ON CEIL(3600*FLOOR(((o.fcstValidEpoch)+3600/2)/3600)) = CEIL(3600*FLOOR(((m0.fcstValidEpoch)+3600/2)/3600))
LET observation = (ARRAY {"station":station,"name":station.name} FOR station IN o.data WHEN station.name IN ["KBYG","KGEY","KSHR"] END),
prediction = (ARRAY {"prediction":station,"name":station.name} FOR station IN OBJECT_VALUES(m0.data) WHEN station.name IN ["KBYG","KGEY","KSHR"] END)
WHERE o.type='DD'
AND o.docType='obs'
AND o.subset='METAR'
AND o.version='V01'
AND o.fcstValidEpoch >= 1612805400 - 900
AND o.fcstValidEpoch <= 1615401000 + 900
AND ARRAY_LENGTH(observation) = 3
AND m0.type='DD'
AND m0.docType='model'
AND m0.subset='METAR'
AND m0.version='V01'
AND m0.model="HRRR"
AND m0.fcstLen=6
AND m0.fcstValidEpoch >= 1612805400 - 900
AND m0.fcstValidEpoch <= 1615401000 + 900
AND CEIL(3600*FLOOR(((m0.fcstValidEpoch)+3600/2)/3600)) = CEIL(3600*FLOOR(((o.fcstValidEpoch)+3600/2)/3600))
AND ARRAY_LENGTH(prediction) = 3
GROUP BY m0.fcstValidEpoch,
         observation,
         prediction
ORDER BY m0.fcstValidEpoch

This is a snippet of my current result set…

[
  {
    "N0": 3,
    "N_times": 1,
    "avtime": 1612807200,
    "max_secs": 1612806780,
    "min_secs": 1612806780,
    "observations": [
      {
        "name": "KBYG",
        "station": {
          "Ceiling": 100,
          "DewPoint": -71,
          "Reported Time": 1612807200,
          "Surface Pressure": 10085,
          "Temperature": -9,
          "Visibility": 1000,
          "WD": 330,
          "WS": 9,
          "name": "KBYG"
        }
      },
      {
        "name": "KSHR",
        "station": {
          "Ceiling": 800,
          "DewPoint": -80,
          "Reported Time": 1612807200,
          "Surface Pressure": 10129,
          "Temperature": 19,
          "Visibility": 1000,
          "WD": 340,
          "WS": 5,
          "name": "KSHR"
        }
      },
      {
        "name": "KGEY",
        "station": {
          "Ceiling": 290,
          "DewPoint": 19,
          "Reported Time": 1612807200,
          "Surface Pressure": 10112,
          "Temperature": 100,
          "Visibility": 175,
          "WD": 0,
          "WS": 0,
          "name": "KGEY"
        }
      }
    ],
    "predictions": [
      {
        "name": "KBYG",
        "prediction": {
          "Ceiling": 6084,
          "DewPoint": -45,
          "RH": 531,
          "Reported Time": 1612806780,
          "Surface Pressure": 8457,
          "Temperature": 101,
          "VGTYP": 10,
          "Visibility": 2175,
          "WD": 108,
          "WS": 5,
          "name": "KBYG"
        }
      },
      {
        "name": "KGEY",
        "prediction": {
          "Ceiling": 6162,
          "DewPoint": -20,
          "RH": 521,
          "Reported Time": 1612806780,
          "Surface Pressure": 8763,
          "Temperature": 130,
          "VGTYP": 10,
          "Visibility": 2094,
          "WD": 153,
          "WS": 4,
          "name": "KGEY"
        }
      },
      {
        "name": "KSHR",
        "prediction": {
          "Ceiling": 6176,
          "DewPoint": -55,
          "RH": 742,
          "Reported Time": 1612806780,
          "Surface Pressure": 8774,
          "Temperature": 14,
          "VGTYP": 10,
          "Visibility": 1373,
          "WD": 352,
          "WS": 5,
          "name": "KSHR"
        }
      }
    ]
  },
  {
    "N0": 3,
    "N_times": 1,
    "avtime": 1612821600,
    "max_secs": 1612821180,
    "min_secs": 1612821180,
    "observations": [
      {
        "name": "KBYG",
        "station": {
          "Ceiling": 260,
          "DewPoint": -80,
          "Reported Time": 1612821600,
          "Surface Pressure": 10115,
          "Temperature": -29,
          "Visibility": 150,
          "WD": 350,
          "WS": 18,
          "name": "KBYG"
        }
      },
      {
        "name": "KSHR",
        "station": {
          "Ceiling": 550,
          "DewPoint": -99,
          "Reported Time": 1612821600,
          "Surface Pressure": 10166,
          "Temperature": -29,
          "Visibility": 200,
          "WD": 300,
          "WS": 21,
          "name": "KSHR"
        }
      },
      {
        "name": "KGEY",
        "station": {
          "Ceiling": 900,
          "DewPoint": -60,
          "Reported Time": 1612821600,
          "Surface Pressure": 10136,
          "Temperature": 10,
          "Visibility": 1000,
          "WD": 310,
          "WS": 21,
          "name": "KGEY"
        }
      }
    ],
    "predictions": [
      {
        "name": "KBYG",
        "prediction": {
          "Ceiling": 6084,
          "DewPoint": 6,
          "RH": 864,
          "Reported Time": 1612821180,
          "Surface Pressure": 8470,
          "Temperature": 41,
          "VGTYP": 10,
          "Visibility": 957,
          "WD": 341,
          "WS": 13,
          "name": "KBYG"
        }
      },
      {
        "name": "KGEY",
        "prediction": {
          "Ceiling": 764,
          "DewPoint": 114,
          "RH": 733,
          "Reported Time": 1612821180,
          "Surface Pressure": 8759,
          "Temperature": 189,
          "VGTYP": 10,
          "Visibility": 1603,
          "WD": 342,
          "WS": 15,
          "name": "KGEY"
        }
      },
      {
        "name": "KSHR",
        "prediction": {
          "Ceiling": 92,
          "DewPoint": 20,
          "RH": 820,
          "Reported Time": 1612821180,
          "Surface Pressure": 8786,
          "Temperature": 66,
          "VGTYP": 10,
          "Visibility": 1125,
          "WD": 322,
          "WS": 7,
          "name": "KSHR"
        }
      }
    ]
  }, 
....

Thanks in advance,

@randy.pierce ,

prediction[*].station.Ceiling will become array , < 1000 will not give right results.
Not sure what is your logic any element match or all element match etc. Also you want use SUM vs COUNT. COUNT, counts for all non non-MISSING, non-NULL values (if you need to use COUNT change ELSE 0 to ELSE NULL).

SUM( CASE WHEN (ANY v IN prediction SATISFIES v.station.Ceiling < 1000 END AND ANY v IN observation SATISFIES v.station.Ceiling < 1000 END )TEHN 1 ELSE 0 END)

SUM( CASE WHEN (ANY AND EVERY v IN prediction SATISFIES v.station.Ceiling < 1000 END AND ANY AND EVERY v IN observation SATISFIES v.station.Ceiling < 1000 END) TEHN 1 ELSE 0 END)