Having trouble with a LEFT OUTER JOIN on documents using a sometimes missing value in a nested array

I have a main document, the structure looks like this:

[
  {
    "DEV_generic": {
      "attributes": {
        "fieldCanonical": "abc-xxx-yyyy",
        "fieldCanonicalHistory": [
          "abc-xxx-yyyy"
        ],
        "someIndicator": "Yes",
        "someField1": "xxx",
        "lastReceivedDate": "2014-11-11",
        "lastRevisionUser": "suser"
      },
      "createDateUtc": 1485905922,
      "crossReferences": [
        {
          "productId": "productId_1234",
          "type": "productType1"
        },
        {
          "productId": "productId_234",
          "type": "productType2"
        },
        {
          "productId": "productId_543",
          "type": "productType4"
        }
      ],
      "id": "xx_1234",
      "parentId": "xx_1200",
      "productType": "mmy",
      "type": "masterProduct",
      "updateDateUtc": 1486075105
    }
  }
]

There is a separate document that is a product group:

[
  {
    "DEV_generic": {
      "attributes": {
        "fieldCanonical": "xxx_series",
        "fieldCanonicalHistory": [
          "xxx_series"
        ],
        "canonical": "bmw",
        "fullName": "BMW",
        "partId": 20005,
        "productGroupFullName": "x-Series",
        "lastReceivedDate": "2014-11-11",
        "lastRevisionUser": "bmctague"
      },
      "createDateUtc": 1485906027,
      "crossReferences": [
        {
          "productId": "productId_958",
          "type": "productType5"
        },
        {
          "productId": "productId_3845",
          "type": "productType6"
        }
      ],
      "id": "prefix2_productId_543",
      "name": "x-Series",
      "parentId": "prefix1_20000005",
      "productType": "productType4",
      "type": "masterProduct",
      "updateDateUtc": 1486075148
    }
  }
]

The crossReferences array in the first document (let’s call it mainProduct) may or may not have a cross reference to the product type of the second document - let’s call it productGroup.

We need to make a query that basically returns:
mainProduct.someIndicator, mainProduct.someField1, productGroup.fieldCanonical

The KEY to the productGroup document is retrieved from mainProduct.crossReferences and built with a concat like ‘masterProduct::’ || crossReferences.productId, but only where crossReferences.type = productType4.

However not every mainProduct document has an associated productGroup inside its crossReferences array.

In regular SQL we would use a LEFT OUTER JOIN but when we attempt the same with N1ql we keep running into problems with the either ambiguous field names or missing index.

The closest we have come to success is this query:

SELECT 
mainProduct.attributes.someIndicator,
mainProduct.attributes.someField1,
mainProduct.id,
cr.productId as mmGroupProductId,
(SELECT mmyGroup.attributes.fullName, mmyGroup.attributes.partId FROM DEV_generic mmyGroup
USE KEYS 'masterProduct::'||cr.productId) 
FROM DEV_generic mainProduct
UNNEST mainProduct.crossReferences as cr
WHERE mainProduct.type= 'masterProduct'
AND mainProduct.productType = 'mmy'
AND cr.type = 'productType4'

However there are a few problems with this:

  1. this is retrieving results only for mainProduct documents that have a cross reference to productType4 - we want to retrieve all mainProduct documents whether they have a productType4 cross reference, just have those cross-referenced productGroup fields be null if they don’t exist
  2. the productGroup field values come back in a nested array

I’ve tried a bunch of variation from looking over other somewhat similar questions in these forums, but it looks like we don’t have the correct indexes set up for these documents either.

Any help would be really appreciated it - we’ve been trying to sort this out for a few days now.

Thanks,
cchong

Try this.

SELECT mainProduct.attributes.someIndicator, mainProduct.attributes.someField1, mainProduct.id, cr.productId as mmGroupProductId
FROM DEV_generic mainProduct LEFT JOIN DEV_generic cr
ON KEYS ARRAY "masterProduct::"||v.productId FOR v IN mainProduct.crossReferences WHEN v.type = "productType4" END
WHERE mainProduct.type= 'masterProduct' AND mainProduct.productType = 'mmy';

You’re a life saver!! I was able to go from there to unnesting the values in the attributes array in the joined table and get what I needed!

My only other question now is indexing. When I do an explain, it currently looks like this:

[
  {
    "plan": {
      "#operator": "Sequence",
      "~children": [
        {
          "#operator": "IntersectScan",
          "scans": [
            {
              "#operator": "IndexScan",
              "index": "productType_idx",
              "index_id": "8b59d24647091959",
              "keyspace": "DEV_generic",
              "namespace": "default",
              "spans": [
                {
                  "Range": {
                    "High": [
                      "\"CarsMMY\""
                    ],
                    "Inclusion": 3,
                    "Low": [
                      "\"CarsMMY\""
                    ]
                  }
                }
              ],
              "using": "gsi"
            },
            {
              "#operator": "IndexScan",
              "index": "type_idx",
              "index_id": "5b5ad3f9e5c4151b",
              "keyspace": "DEV_generic",
              "namespace": "default",
              "spans": [
                {
                  "Range": {
                    "High": [
                      "\"masterProduct\""
                    ],
                    "Inclusion": 3,
                    "Low": [
                      "\"masterProduct\""
                    ]
                  }
                }
              ],
              "using": "gsi"
            }
          ]
        },
        {
          "#operator": "Fetch",
          "as": "mainProduct",
          "keyspace": "DEV_generic",
          "namespace": "default"
        },
        {
          "#operator": "Parallel",
          "~child": {
            "#operator": "Sequence",
            "~children": [
              {
                "#operator": "Join",
                "as": "cr",
                "keyspace": "DEV_generic",
                "namespace": "default",
                "on_keys": "array (\"masterProduct::\" || (`v`.`productId`)) for `v` in (`mainProduct`.`crossReferences`) when ((`v`.`type`) = \"CarsMMYToMMGroup\") end",
                "outer": true
              },
              {
                "#operator": "Unnest",
                "as": "mmattr",
                "expr": "(`cr`.`attributes`)"
              },
              {
                "#operator": "Filter",
                "condition": "((((`mainProduct`.`type`) = \"masterProduct\") and ((`mainProduct`.`productType`) = \"CarsMMY\")) and (((`mainProduct`.`attributes`).`carsCpoIndicator`) = \"Yes\"))"
              },
              {
                "#operator": "InitialProject",
                "result_terms": [
                  {
                    "expr": "((`mainProduct`.`attributes`).`carsMakeId`)"
                  },
                  {
                    "expr": "((`mainProduct`.`attributes`).`carsMakeFullName`)"
                  },
                  {
                    "expr": "((`mainProduct`.`attributes`).`carsModelIdRef`)"
                  },
                  {
                    "expr": "((`mainProduct`.`attributes`).`carsModelFullName`)"
                  },
                  {
                    "expr": "((`mainProduct`.`attributes`).`carsYearIdRef`)"
                  },
                  {
                    "expr": "((`mainProduct`.`attributes`).`carsYearShortNameRef`)"
                  },
                  {
                    "expr": "((`mainProduct`.`attributes`).`carsNewIndicator`)"
                  },
                  {
                    "expr": "((`mainProduct`.`attributes`).`carsUsedIndicator`)"
                  },
                  {
                    "expr": "((`mainProduct`.`attributes`).`carsCpoIndicator`)"
                  },
                  {
                    "expr": "((`mainProduct`.`attributes`).`carsHistoricalIndicator`)"
                  },
                  {
                    "expr": "((`mainProduct`.`attributes`).`carsMakeCanonical`)"
                  },
                  {
                    "expr": "((`mainProduct`.`attributes`).`carsModelCanonical`)"
                  },
                  {
                    "expr": "((`mainProduct`.`attributes`).`carsCanonical`)"
                  },
                  {
                    "expr": "(`mainProduct`.`id`)"
                  },
                  {
                    "expr": "(`mmattr`.`carsMakeModelGroupFullName`)"
                  },
                  {
                    "expr": "(`mmattr`.`carsMakeModelGroupId`)"
                  }
                ]
              },
              {
                "#operator": "FinalProject"
              }
            ]
          }
        }
      ]
    },
    "text": "\nSELECT \nmainProduct.attributes.carsMakeId,\nmainProduct.attributes.carsMakeFullName,\nmainProduct.attributes.carsModelIdRef,\nmainProduct.attributes.carsModelFullName,\nmainProduct.attributes.carsYearIdRef,\nmainProduct.attributes.carsYearShortNameRef,\nmainProduct.attributes.carsNewIndicator,\nmainProduct.attributes.carsUsedIndicator,\nmainProduct.attributes.carsCpoIndicator,\nmainProduct.attributes.carsHistoricalIndicator,\nmainProduct.attributes.carsMakeCanonical,\nmainProduct.attributes.carsModelCanonical,\nmainProduct.attributes.carsCanonical,\nmainProduct.id,\nmmattr.carsMakeModelGroupFullName,\nmmattr.carsMakeModelGroupId\nFROM DEV_generic mainProduct\nLEFT JOIN DEV_generic cr\nON KEYS ARRAY \"masterProduct::\"||v.productId FOR v IN mainProduct.crossReferences WHEN v.type = \"CarsMMYToMMGroup\" END\nUNNEST cr.attributes as mmattr\nWHERE mainProduct.type= 'masterProduct'\nAND mainProduct.productType = 'CarsMMY'\nAND mainProduct.attributes.carsCpoIndicator = 'Yes'"
  }
]

Are there any additional indexes I could be setting?

Based on sample document it is too big to make it cover in this case. Best Index for this case will be. Drop productType_idx, type_idx. Make sure no IntersectScan in EXPLAIN.

CREATE INDEX productType_idx ON DEV_generic(productType) WHERE type = "msterProduct";

Unfortunately those indexes are being used by other queries. Is there any other way to handle that type of situation?

Create new index without dropping used indexes and check EXPLAIN if still uses IntersectScan you can specify USE INDEX hint to query to avoid IntersectScan.

Using the hint successfully removed the intersect scan:

[
  {
    "plan": {
      "#operator": "Sequence",
      "~children": [
        {
          "#operator": "IndexScan",
          "index": "productType_idx",
          "index_id": "8b59d24647091959",
          "keyspace": "DEV_generic",
          "namespace": "default",
          "spans": [
            {
              "Range": {
                "High": [
                  "\"CarsMMY\""
                ],
                "Inclusion": 3,
                "Low": [
                  "\"CarsMMY\""
                ]
              }
            }
          ],
          "using": "gsi"
        },
        {
          "#operator": "Fetch",
          "as": "mainProduct",
          "keyspace": "DEV_generic",
          "namespace": "default"
        },
        {
          "#operator": "Parallel",
          "~child": {
            "#operator": "Sequence",
            "~children": [
              {
                "#operator": "Join",
                "as": "cr",
                "keyspace": "DEV_generic",
                "namespace": "default",
                "on_keys": "array (\"masterProduct::\" || (`v`.`productId`)) for `v` in (`mainProduct`.`crossReferences`) when ((`v`.`type`) = \"CarsMMYToMMGroup\") end",
                "outer": true
              },
              {
                "#operator": "Unnest",
                "as": "mmattr",
                "expr": "(`cr`.`attributes`)"
              },
              {
                "#operator": "Filter",
                "condition": "((((`mainProduct`.`type`) = \"masterProduct\") and ((`mainProduct`.`productType`) = \"CarsMMY\")) and (((`mainProduct`.`attributes`).`carsCpoIndicator`) = \"Yes\"))"
              },
              {
                "#operator": "InitialProject",
                "result_terms": [
                  {
                    "expr": "((`mainProduct`.`attributes`).`carsMakeId`)"
                  },
                  {
                    "expr": "((`mainProduct`.`attributes`).`carsMakeFullName`)"
                  },
                  {
                    "expr": "((`mainProduct`.`attributes`).`carsModelIdRef`)"
                  },
                  {
                    "expr": "((`mainProduct`.`attributes`).`carsModelFullName`)"
                  },
                  {
                    "expr": "((`mainProduct`.`attributes`).`carsYearIdRef`)"
                  },
                  {
                    "expr": "((`mainProduct`.`attributes`).`carsYearShortNameRef`)"
                  },
                  {
                    "expr": "((`mainProduct`.`attributes`).`carsNewIndicator`)"
                  },
                  {
                    "expr": "((`mainProduct`.`attributes`).`carsUsedIndicator`)"
                  },
                  {
                    "expr": "((`mainProduct`.`attributes`).`carsCpoIndicator`)"
                  },
                  {
                    "expr": "((`mainProduct`.`attributes`).`carsHistoricalIndicator`)"
                  },
                  {
                    "expr": "((`mainProduct`.`attributes`).`carsMakeCanonical`)"
                  },
                  {
                    "expr": "((`mainProduct`.`attributes`).`carsModelCanonical`)"
                  },
                  {
                    "expr": "((`mainProduct`.`attributes`).`carsCanonical`)"
                  },
                  {
                    "expr": "(`mainProduct`.`id`)"
                  },
                  {
                    "expr": "(`mmattr`.`carsMakeModelGroupFullName`)"
                  },
                  {
                    "expr": "(`mmattr`.`carsMakeModelGroupId`)"
                  }
                ]
              },
              {
                "#operator": "FinalProject"
              }
            ]
          }
        }
      ]
    },
    "text": "\nSELECT \nmainProduct.attributes.carsMakeId,\nmainProduct.attributes.carsMakeFullName,\nmainProduct.attributes.carsModelIdRef,\nmainProduct.attributes.carsModelFullName,\nmainProduct.attributes.carsYearIdRef,\nmainProduct.attributes.carsYearShortNameRef,\nmainProduct.attributes.carsNewIndicator,\nmainProduct.attributes.carsUsedIndicator,\nmainProduct.attributes.carsCpoIndicator,\nmainProduct.attributes.carsHistoricalIndicator,\nmainProduct.attributes.carsMakeCanonical,\nmainProduct.attributes.carsModelCanonical,\nmainProduct.attributes.carsCanonical,\nmainProduct.id,\nmmattr.carsMakeModelGroupFullName,\nmmattr.carsMakeModelGroupId\nFROM DEV_generic mainProduct\nUSE INDEX (productType_idx)\nLEFT JOIN DEV_generic cr\nON KEYS ARRAY \"masterProduct::\"||v.productId FOR v IN mainProduct.crossReferences WHEN v.type = \"CarsMMYToMMGroup\" END\nUNNEST cr.attributes as mmattr\nWHERE mainProduct.type= 'masterProduct'\nAND mainProduct.productType = 'CarsMMY'\nAND mainProduct.attributes.carsCpoIndicator = 'Yes'"
  }
]

You have mainProduct.attributes.carsCpoIndicator = ‘Yes’ as predicate if you want you can add that to index key to reduce initial document qualification.