Writing an indexed join

We’re trying to write our first indexed join, and attempted to following the beer-sample example as close as possible:
https://developer.couchbase.com/documentation/server/current/n1ql/n1ql-language-reference/from.html

Here’s the query we came up with, but it’s not working properly. The end-goal is being able to use the index on the RHS to quickly filter the rows on the LHS. However it’s returning zero results:

SELECT visit.visitNumber, chg.amount
FROM cloud_med visit 
LEFT JOIN cloud_med charge ON KEY "visit-client_name-" || charge.visitNumber FOR visit 
UNNEST charge.visitCharges chg 
WHERE charge.`type` = "charges" 
      AND SUBSTR(visit.dischargeDate >= "2016-12-01",0,7) AND SUBSTR(visit.dischargeDate <= "2016-12-02",0,7) 
      AND meta(visit).id LIKE "visit-%" AND visit.patientClass LIKE "I%" AND visit.drgCode != "" 
      AND visit.tenantName = "client_name"

Here are the applicable indexes:

CREATE INDEX idx_dischargeDate ON cloud_med(tenantName,dischargeDate) WHERE meta().id) like "visit-%" and patientClass like "I%" and drgCode != ""

CREATE INDEX idx_charge_visitNumber ON cloud_med(visitNumber) WHERE `type` = "charges"

And finally, here’s a snapshot of the documents in play for this query. This is the “charges” document with the nested array of charges. In addition to their being MANY nested charges, there will also be MANY charge documents. Meaning, page 0 has a document key as listed below. Page 1 would be charges-client_name-123456789-1, Page 2 would be charges-client_name-123456789-2, etc.

{
  "docID": "charges-client_name-123456789",
  "type": "charges",
  "pageNumber": 0,
  "visitNumber": "345000880023",
  "visitCharges": [
    {
      "amount": 117.41,
      "chargeCode": "138169",
      "visitNumber": "345000880023"
    },
  {
      "amount": 1296,
      "chargeCode": "114787",
      "visitNumber": "345000880023"
    }
  ]
}

Here’s the “visit” document. This will be the SINGLE side of the join (meaning 1 visit per many charge documents).

{
  "docID": "visit-client_name-123456789",
  "type": "visit",
  "accountBalance": 39387.45,
  "admitDate": "2016-09-20T15:57:00-07:00",
  "visitNumber: "123456789"
}

Did you take a look at this. https://dzone.com/articles/join-faster-with-couchbase-index-joins

Can you post the EXPLAIN output of this query. You can avoid the SUBSTR() altogether.

EXPLAIN
SELECT visit.visitNumber, chg.amount
FROM cloud_med visit
LEFT JOIN cloud_med charge ON KEY "visit-client_name-" || charge.visitNumber FOR visit
UNNEST charge.visitCharges chg
WHERE charge.`type` = "charges"
      AND visit.dischargeDate >= "2016-12-01" AND visit.dischargeDate < "2016-12-03"
      AND META(visit).id LIKE "visit-%" AND visit.patientClass LIKE "I%" AND visit.drgCode != ""
      AND visit.tenantName = "client_name"

Sorry for the late reply, I was out of work for several days on a family emergency :frowning: Here’s the formatted version. The first picture is the left-most portion of the EXPLAIN and the second picture is the right-most portion:

Then, here’s the json version of you have the ability to format this with a tool on your side:

 [
  {
    "plan": {
      "#operator": "Sequence",
      "~children": [
        {
          "#operator": "IntersectScan",
          "scans": [
            {
              "#operator": "IndexScan",
              "index": "idx_dischargeDate",
              "index_id": "9b4d8c27005dd200",
              "keyspace": "bucket",
              "namespace": "default",
              "spans": [
                {
                  "Range": {
                    "High": [
                      "\"client_name\"",
                      "\"2016-12-02\""
                    ],
                    "Inclusion": 3,
                    "Low": [
                      "\"client_name\"",
                      "\"2016-12-01\""
                    ]
                  }
                }
              ],
              "using": "gsi"
            },
            {
              "#operator": "IndexScan",
              "index": "#primary",
              "index_id": "2c8fd2136f9792bc",
              "keyspace": "bucket",
              "namespace": "default",
              "spans": [
                {
                  "Range": {
                    "High": [
                      "\"visit.\""
                    ],
                    "Inclusion": 1,
                    "Low": [
                      "\"visit-\""
                    ]
                  }
                }
              ],
              "using": "gsi"
            }
          ]
        },
        {
          "#operator": "Fetch",
          "as": "visit",
          "keyspace": "bucket",
          "namespace": "default"
        },
        {
          "#operator": "Parallel",
          "~child": {
            "#operator": "Sequence",
            "~children": [
              {
                "#operator": "IndexJoin",
                "as": "charge",
                "for": "visit",
                "keyspace": "bucket",
                "namespace": "default",
                "on_key": "(\"visit-client_name-\" || (`charge`.`visitNumber`))",
                "scan": {
                  "index": "idx_charge_visitNumber",
                  "index_id": "3fe7bca45115d5eb",
                  "using": "gsi"
                }
              },
              {
                "#operator": "Unnest",
                "as": "chg",
                "expr": "(`charge`.`visitCharges`)"
              },
              {
                "#operator": "Filter",
                "condition": "((((((((`charge`.`type`) = \"charges\") and (\"2016-12-01\" <= (`visit`.`dischargeDate`))) and ((`visit`.`dischargeDate`) <= \"2016-12-02\")) and ((meta(`visit`).`id`) like \"visit-%\")) and ((`visit`.`patientClass`) like \"I%\")) and (not ((`visit`.`drgCode`) = \"\"))) and ((`visit`.`tenantName`) = \"client_name\"))"
              },
              {
                "#operator": "InitialProject",
                "result_terms": [
                  {
                    "expr": "(`visit`.`visitNumber`)"
                  },
                  {
                    "expr": "(`chg`.`amount`)"
                  }
                ]
              },
              {
                "#operator": "FinalProject"
              }
            ]
          }
        }
      ]
    },
    "text": "\r\nSELECT visit.visitNumber, chg.amount\r\n      FROM bucket visit \r\n      JOIN bucket charge ON KEY \"visit-client_name-\" || charge.visitNumber FOR visit \r\n      UNNEST charge.visitCharges chg \r\n      WHERE charge.`type` = \"charges\" \r\n            AND visit.dischargeDate >= \"2016-12-01\" AND visit.dischargeDate <= \"2016-12-02\" \r\n            AND meta(visit).id LIKE \"visit-%\" AND visit.patientClass LIKE \"I%\" AND visit.drgCode != \"\" \r\n            AND visit.tenantName = \"client_name\""
  }
]

I hope all is well.

Does the query give the correct results?

No, that’s the issue, it returns zero results. We did testing of both the LHS and RHS values. So for example, running the query with just the LHS cloud_med visit bucket and WHERE criteria returns records. Then running the SELECT for the RHS that hits a covering index on the charge documents works fine as well e.g.:

SELECT visitNumber FROM cloud_med
WHERE type = “charges”

We assumed that as long as that returned visitNumbers that matched to what the LHS returns, we should be in business, right?

Ok, we need help from @vsr1.

You can also try removing the UNNEST, or changing it to a LEFT UNNEST. That will show if the UNNEST is eliminating rows.

CREATE INDEX idx_charge_visitNumber ON cloud_med("visit-client_name-" ||visitNumber) WHERE `type` = "charges";
SELECT visit.visitNumber, chg.amount
FROM cloud_med visit
LEFT JOIN cloud_med charge ON KEY "visit-client_name-" || charge.visitNumber FOR visit
LEFT UNNEST charge.visitCharges chg
WHERE charge.`type` = "charges"
      AND visit.dischargeDate >= "2016-12-01" AND visit.dischargeDate <= "2016-12-02"
      AND meta(visit).id LIKE "visit-%" AND visit.patientClass LIKE "I%" AND visit.drgCode != ""
      AND visit.tenantName = "client_name";
2 Likes

Ok I see, you need to build the full document key inside the index. I’ll give this a shot and report back. Thanks a million guys.

That worked! Awesome, thanks guys. So the key take-away is that the data within the index must match the RHS expression exactly. Which is a little different than the traditional ON KEYS join.

1 Like