ANSI JOIN Support in N1QL: Overview + Details

Overview

ANSI JOIN support is added in N1QL to Couchbase version 5.5. In previous versions of Couchbase, join support was limited to lookup join and index join, which works great when the document key from one side of the join can be produced by the other side of the join — that is, joining on a parent-child or child-parent relationship through a document key.

Where this approach fall short is when the join is on arbitrary fields or expressions of fields, or when multiple join conditions are required. ANSI JOIN is a standardized join syntax widely used in relational databases. ANSI JOIN is much more flexible than lookup join and index join, allowing join to be done on arbitrary expressions on any fields in a document. This makes join operations much simpler and more powerful.

ANSI JOIN syntax:

lhs-expression [ join-type ] JOIN rhs-keyspace ON [ join-condition ]

The left-hand side of the join, lhs-expression can be a keyspace, a N1QL expression, a subquery, or a previous join. The right-hand side of the join, rhs-keyspace, must be a keyspace. The ON-clause specifies the join condition, which can be any arbitrary expression, although it should contain predicates that allows an index scan on the right-hand side keyspace. Join-type can be INNER, LEFT OUTER, RIGHT OUTER. The INNER and OUTER keywords are optional, thus JOIN is the same as INNER JOIN, and LEFT JOIN is the same as LEFT OUTER JOIN. In relational databases join-type can also be FULL OUTER or CROSS, although FULL OUTER JOIN and CROSS JOIN are not supported currently in N1QL.

Details of ANSI JOIN support

We’ll use examples to show you new ways you can run queries using ANSI JOIN syntax, and how to transform your existing join queries in N1QL from lookup join or index join syntax into new ANSI JOIN syntax. It should be noted that lookup join and index join will continue to be supported in N1QL for backward compatibility, however you cannot mix lookup join or index join with the new ANSI JOIN syntax in the same query block, thus customers are encouraged to migrate to the new ANSI JOIN syntax.

To follow along, install the travel-sample sample bucket.

Example 1: ANSI JOIN with arbitrary join condition

The join condition (ON-clause) for ANSI JOIN can be any expression, involving any fields of the documents being joined. For example:

Required index:

CREATE INDEX route_airports ON `travel-sample`(sourceairport, destinationairport) WHERE type = "route";

1	CREATE INDEX route_airports ON `travel-sample`(sourceairport, destinationairport) WHERE type = "route";

Optional index:

CREATE INDEX airport_city_country ON `travel-sample`(city, country) WHERE type = "airport";

1	CREATE INDEX airport_city_country ON `travel-sample`(city, country) WHERE type = "airport";

Query:

SELECT DISTINCT route.destinationairport
FROM `travel-sample` airport JOIN `travel-sample` route
     ON airport.faa = route.sourceairport
        AND route.type = "route"
WHERE airport.type = "airport"
  AND airport.city = "San Francisco"
  AND airport.country = "United States";

SELECT DISTINCT route.destinationairport

FROM `travel-sample` airport JOIN `travel-sample` route

ON airport.faa = route.sourceairport

AND route.type = "route"

WHERE airport.type = "airport"

AND airport.city = "San Francisco"

AND airport.country = "United States";

In this query we are joining a field (“faa”) from the airport document with a field (“sourceairport”) from the route document (see the ON clause of the join). Such join is not possible with lookup join or index join in N1QL, since both requires joining on document key only.

ANSI JOIN requires an appropriate index on the right-hand side keyspace (“Required index” above). You can also create other indexes (e.g. “Optional index” above) to speed up your query. Without the optional index a primary scan will be used and query still works, however without the required index the query will not work and will return an error.

Looking at the explain:

"plan": {
   "#operator": "Sequence",
   "~children": [
       {
           "#operator": "IndexScan3",
           "as": "airport",
           "index": "airport_city_country",
           "index_id": "8e782fd1b124eec3",
           "index_projection": {
               "primary_key": true
           },
           "keyspace": "travel-sample",
           "namespace": "default",
           "spans": [
               {
                   "exact": true,
                   "range": [
                       {
                           "high": "\"San Francisco\"",
                           "inclusion": 3,
                           "low": "\"San Francisco\""
                       },
                       {
                           "high": "\"United States\"",
                           "inclusion": 3,
                           "low": "\"United States\""
                       }
                   ]
               }
           ],
           "using": "gsi"
       },
       {
           "#operator": "Fetch",
           "as": "airport",
           "keyspace": "travel-sample",
           "namespace": "default"
       },
       {
           "#operator": "Parallel",
           "~child": {
               "#operator": "Sequence",
               "~children": [
                   {
                       "#operator": "NestedLoopJoin",
                       "alias": "route",
                       "on_clause": "(((`airport`.`faa`) = cover ((`route`.`sourceairport`))) and (cover ((`route`.`type`)) = \"route\"))",
                       "~child": {
                           "#operator": "Sequence",
                           "~children": [
                               {
                                   "#operator": "IndexScan3",
                                   "as": "route",
                                   "covers": [
                                       "cover ((`route`.`sourceairport`))",
                                       "cover ((`route`.`destinationairport`))",
                                       "cover ((meta(`route`).`id`))"
                                   ],
                                   "filter_covers": {
                                       "cover ((`route`.`type`))": "route"
                                   },
                                   "index": "route_airports",
                                   "index_id": "f1f4b9fbe85e45fd",
                                   "keyspace": "travel-sample",
                                   "namespace": "default",
                                   "nested_loop": true,
                                   "spans": [
                                       {
                                           "exact": true,
                                           "range": [
                                               {
                                                   "high": "(`airport`.`faa`)",
                                                   "inclusion": 3,
                                                   "low": "(`airport`.`faa`)"
                                               }
                                           ]
                                       }
                                   ],
                                   "using": "gsi"
                               }
                           ]
                       }
                   },
                   {
                       "#operator": "Filter",
                       "condition": "((((`airport`.`type`) = \"airport\") and ((`airport`.`city`) = \"San Francisco\")) and ((`airport`.`country`) = \"United States\"))"
                   },
                   {
                       "#operator": "InitialProject",
                       "distinct": true,
                       "result_terms": [
                           {
                               "expr": "cover ((`route`.`destinationairport`))"
                           }
                       ]
                   },
                   {
                       "#operator": "Distinct"
                   },
                   {
                       "#operator": "FinalProject"
                   }
               ]
           }
       },
       {
           "#operator": "Distinct"
       }
   ]
}

100

101

102

103

104

105

106

107

108

109

110

"plan": {

"#operator": "Sequence",

"~children": [

{

"#operator": "IndexScan3",

"as": "airport",

"index": "airport_city_country",

"index_id": "8e782fd1b124eec3",

"index_projection": {

"primary_key": true

"keyspace": "travel-sample",

"namespace": "default",

"spans": [

{

"exact": true,

"range": [

{

"high": "\"San Francisco\"",

"inclusion": 3,

"low": "\"San Francisco\""

{

"high": "\"United States\"",

"inclusion": 3,

"low": "\"United States\""

}

]

}

"using": "gsi"

{

"#operator": "Fetch",

"as": "airport",

"keyspace": "travel-sample",

"namespace": "default"

{

"#operator": "Parallel",

"~child": {

"#operator": "Sequence",

"~children": [

{

"#operator": "NestedLoopJoin",

"alias": "route",

"on_clause": "(((`airport`.`faa`) = cover ((`route`.`sourceairport`))) and (cover ((`route`.`type`)) = \"route\"))",

"~child": {

"#operator": "Sequence",

"~children": [

{

"#operator": "IndexScan3",

"as": "route",

"covers": [

"cover ((`route`.`sourceairport`))",

"cover ((`route`.`destinationairport`))",

"cover ((meta(`route`).`id`))"

"filter_covers": {

"cover ((`route`.`type`))": "route"

"index": "route_airports",

"index_id": "f1f4b9fbe85e45fd",

"keyspace": "travel-sample",

"namespace": "default",

"nested_loop": true,

"spans": [

{

"exact": true,

"range": [

{

"high": "(`airport`.`faa`)",

"inclusion": 3,

"low": "(`airport`.`faa`)"

}

]

}

"using": "gsi"

}

]

}

{

"#operator": "Filter",

"condition": "((((`airport`.`type`) = \"airport\") and ((`airport`.`city`) = \"San Francisco\")) and ((`airport`.`country`) = \"United States\"))"

{

"#operator": "InitialProject",

"distinct": true,

"result_terms": [

{

"expr": "cover ((`route`.`destinationairport`))"

}

]

{

"#operator": "Distinct"

{

"#operator": "FinalProject"

}

]

}

{

"#operator": "Distinct"

}

]

}

You will see a NestedLoopJoin operator is used to perform the join, and underneath that an IndexScan3 operator is used to access the right-hand side keyspace, “route”. The spans for the index scan looks like:

"spans": [
   {
       "exact": true,
       "range": [
           {
               "high": "(`airport`.`faa`)",
               "inclusion": 3,
               "low": "(`airport`.`faa`)"
           }
       ]
   }
]

"spans": [

{

"exact": true,

"range": [

{

"high": "(`airport`.`faa`)",

"inclusion": 3,

"low": "(`airport`.`faa`)"

}

]

}

]

The index scan for the right-hand side keyspace (“route”) is using a field (“faa”) from the left-hand side keyspace (“airport”) as search key. For each document from outer side keyspace “airport” the NestedLoopJoin operator performs an index scan on the inner side keyspace “route” to find matching documents, and produces join results. The join is performed in a nested-loop fashion, where the outer loop produces document from outer side keyspace, and a nested inner loop searches for matching inner side document for the current outer side document.

The explain information can also be view graphically on Query Workbench, by clicking the Explain button followed by the Plan button:

Visual Explain for ANSI JOIN

In this example the index scan on the right-hand side keyspace is a covered index scan. In case the index scan is not covered, a fetch operator will be following the index scan operator to fetch the document.

It should be noted that nested-loop join requires an appropriate secondary index on the right-hand side keyspace of ANSI JOIN. Primary index is not considered for this purpose. If an appropriate secondary cannot be found, an error will be returned for the query.

In addition, you might have noticed that the filter route.type = “route” appears in the ON-clause as well. The ON-clause is different than the WHERE clause in that the ON-clause is evaluated as part of the join, while the WHERE clause is evaluated after all joins are done. This distinction is important, especially for outer joins. Therefore it is recommended that you include filters on the right-hand side keyspace for a join in the ON-clause as well, in addition to any join filters.

Example 2: ANSI JOIN with multiple join conditions

While lookup join and index join only joins on a single join condition (equality of document key), the ON-clause of ANSI JOIN can contain multiple join conditions.

Required index:

CREATE INDEX landmark_city_country ON `travel-sample`(city, country) WHERE type = "landmark";

1	CREATE INDEX landmark_city_country ON `travel-sample`(city, country) WHERE type = "landmark";

Optional index:

CREATE INDEX hotel_title ON `travel-sample`(title) WHERE type = "hotel";

1	CREATE INDEX hotel_title ON `travel-sample`(title) WHERE type = "hotel";

Query:

SELECT hotel.name hotel_name, landmark.name landmark_name, landmark.activity
FROM `travel-sample` hotel JOIN `travel-sample` landmark
    ON hotel.city = landmark.city AND hotel.country = landmark.country AND landmark.type = "landmark"
WHERE hotel.type = "hotel" AND hotel.title like "Yosemite%" AND array_length(hotel.public_likes) > 5;

SELECT hotel.name hotel_name, landmark.name landmark_name, landmark.activity

FROM `travel-sample` hotel JOIN `travel-sample` landmark

ON hotel.city = landmark.city AND hotel.country = landmark.country AND landmark.type = "landmark"

WHERE hotel.type = "hotel" AND hotel.title like "Yosemite%" AND array_length(hotel.public_likes) > 5;

Looking at the explain, the index spans for the index (“landmark_city_country”) of the right-hand side keyspace (“landmark”) is:

"spans": [
   {
       "exact": true,
       "range": [
           {
               "high": "(`hotel`.`city`)",
               "inclusion": 3,
               "low": "(`hotel`.`city`)"
           },
           {
               "high": "(`hotel`.`country`)",
               "inclusion": 3,
               "low": "(`hotel`.`country`)"
           }
       ]
   }
]

"spans": [

{

"exact": true,

"range": [

{

"high": "(`hotel`.`city`)",

"inclusion": 3,

"low": "(`hotel`.`city`)"

{

"high": "(`hotel`.`country`)",

"inclusion": 3,

"low": "(`hotel`.`country`)"

}

]

}

]

Thus multiple join predicates can potentially generate multiple index search keys for the index scan of the inner side of a nested-loop join.

Example 3: ANSI JOIN with complex join expressions

The join condition in the ON-clause can be complex join expression. For example, the “airlineid” field in “route” document corresponds to the document key for “airline” document, but it can also be constructed by concatenating “airline_” with the “id” field of the “airline” document.

Required index:

CREATE INDEX route_airlineid ON `travel-sample`(airlineid) WHERE type = "route";

1	CREATE INDEX route_airlineid ON `travel-sample`(airlineid) WHERE type = "route";

Optional index:

CREATE INDEX airline_name ON `travel-sample`(name) WHERE type = "airline";

1	CREATE INDEX airline_name ON `travel-sample`(name) WHERE type = "airline";

Query:

SELECT count(*)
FROM `travel-sample` airline JOIN `travel-sample` route
     ON route.airlineid = "airline_" || tostring(airline.id) AND route.type = "route"
WHERE airline.type = "airline" AND airline.name = "United Airlines";

SELECT count(*)

FROM `travel-sample` airline JOIN `travel-sample` route

ON route.airlineid = "airline_" || tostring(airline.id) AND route.type = "route"

WHERE airline.type = "airline" AND airline.name = "United Airlines";

The explain contains the following index spans for the right-hand side keyspace(“route”):

"spans": [
   {
       "exact": true,
       "range": [
           {
               "high": "(\"airline_\" || to_string((`airline`.`id`)))",
               "inclusion": 3,
               "low": "(\"airline_\" || to_string((`airline`.`id`)))"
           }
       ]
   }
]

"spans": [

{

"exact": true,

"range": [

{

"high": "(\"airline_\" || to_string((`airline`.`id`)))",

"inclusion": 3,

"low": "(\"airline_\" || to_string((`airline`.`id`)))"

}

]

}

]

The expression will be evaluated at runtime to generate the search keys for the index scan on the inner side of nested-loop join.

Example 4: ANSI JOIN with IN clause

The join condition does not need to be an equality predicate. An IN-clause can be used as a join condition.

Required index:

CREATE INDEX airport_faa_name ON `travel-sample`(faa, airportname) WHERE type = "airport";

1	CREATE INDEX airport_faa_name ON `travel-sample`(faa, airportname) WHERE type = "airport";

Optional index:

CREATE INDEX route_airline_distance ON `travel-sample`(airline, distance) WHERE type = "route";

1	CREATE INDEX route_airline_distance ON `travel-sample`(airline, distance) WHERE type = "route";

Query:

SELECT DISTINCT airport.airportname
FROM `travel-sample` route JOIN `travel-sample` airport
    ON airport.faa IN [ route.sourceairport, route.destinationairport ] AND airport.type = "airport"
WHERE route.type = "route" AND route.airline = "F9" AND route.distance > 3000;

SELECT DISTINCT airport.airportname

FROM `travel-sample` route JOIN `travel-sample` airport

ON airport.faa IN [ route.sourceairport, route.destinationairport ] AND airport.type = "airport"

WHERE route.type = "route" AND route.airline = "F9" AND route.distance > 3000;

The explain contains the following index spans for the right-hand side keyspace(“airport”):

"spans": [
    {
        "range": [
            {
                "high": "(`route`.`sourceairport`)",
                "inclusion": 3,
                "low": "(`route`.`sourceairport`)"
            }
        ]
    },
    {
        "range": [
            {
                "high": "(`route`.`destinationairport`)",
                "inclusion": 3,
                "low": "(`route`.`destinationairport`)"
            }
        ]
    }
]

"spans": [

{

"range": [

{

"high": "(`route`.`sourceairport`)",

"inclusion": 3,

"low": "(`route`.`sourceairport`)"

}

]

{

"range": [

{

"high": "(`route`.`destinationairport`)",

"inclusion": 3,

"low": "(`route`.`destinationairport`)"

}

]

}

]

Example 5: ANSI JOIN with OR clause

Similar to IN-clause, the join condition for an ANSI JOIN can also contain an OR-clause. Different arms of the OR-clause can potentially reference different fields of the right-hand side keyspace, as long as appropriate indexes exists.

Required index (route_airports index same as example 1):

CREATE INDEX route_airports ON `travel-sample`(sourceairport, destinationairport) WHERE type = "route";
CREATE INDEX route_airports2 ON `travel-sample`(destinationairport, sourceairport) WHERE type = "route";

1 2	CREATE INDEX route_airports ON `travel-sample`(sourceairport, destinationairport) WHERE type = "route"; CREATE INDEX route_airports2 ON `travel-sample`(destinationairport, sourceairport) WHERE type = "route";

Optional index (same as example 1):

CREATE INDEX airport_city_country ON `travel-sample`(city, country) WHERE type = "airport";

1	CREATE INDEX airport_city_country ON `travel-sample`(city, country) WHERE type = "airport";

Query:

SELECT count(*)
FROM `travel-sample` airport JOIN `travel-sample` route
    ON (route.sourceairport = airport.faa OR route.destinationairport = airport.faa) AND route.type = "route"
WHERE airport.type = "airport" AND airport.city = "Denver" AND airport.country = "United States";

SELECT count(*)

FROM `travel-sample` airport JOIN `travel-sample` route

ON (route.sourceairport = airport.faa OR route.destinationairport = airport.faa) AND route.type = "route"

WHERE airport.type = "airport" AND airport.city = "Denver" AND airport.country = "United States";

The explain shows an UnionScan being used under NestedLoopJoin, to handle the OR-clause:

"#operator": "UnionScan",
"scans": [
   {
       "#operator": "IndexScan3",
       "as": "route",
       "index": "route_airports",
       "index_id": "f1f4b9fbe85e45fd",
       "index_projection": {
           "primary_key": true
       },
       "keyspace": "travel-sample",
       "namespace": "default",
       "nested_loop": true,
       "spans": [
           {
               "exact": true,
               "range": [
                   {
                       "high": "(`airport`.`faa`)",
                       "inclusion": 3,
                       "low": "(`airport`.`faa`)"
                   }
               ]
           }
       ],
       "using": "gsi"
   },
   {
       "#operator": "IndexScan3",
       "as": "route",
       "index": "route_airports2",
       "index_id": "cdc9dca18c973bd3",
       "index_projection": {
           "primary_key": true
       },
       "keyspace": "travel-sample",
       "namespace": "default",
       "nested_loop": true,
       "spans": [
           {
               "exact": true,
               "range": [
                   {
                       "high": "(`airport`.`faa`)",
                       "inclusion": 3,
                       "low": "(`airport`.`faa`)"
                   }
               ]
           }
       ],
       "using": "gsi"
   }
]

"#operator": "UnionScan",

"scans": [

{

"#operator": "IndexScan3",

"as": "route",

"index": "route_airports",

"index_id": "f1f4b9fbe85e45fd",

"index_projection": {

"primary_key": true

"keyspace": "travel-sample",

"namespace": "default",

"nested_loop": true,

"spans": [

{

"exact": true,

"range": [

{

"high": "(`airport`.`faa`)",

"inclusion": 3,

"low": "(`airport`.`faa`)"

}

]

}

"using": "gsi"

{

"#operator": "IndexScan3",

"as": "route",

"index": "route_airports2",

"index_id": "cdc9dca18c973bd3",

"index_projection": {

"primary_key": true

"keyspace": "travel-sample",

"namespace": "default",

"nested_loop": true,

"spans": [

{

"exact": true,

"range": [

{

"high": "(`airport`.`faa`)",

"inclusion": 3,

"low": "(`airport`.`faa`)"

}

]

}

"using": "gsi"

}

]

Example 6: ANSI JOIN with hints

For lookup join and index join, hints can only be specified on the keyspace on the left-hand side of the join. For ANSI JOIN, hints can be specified on the right-hand side keyspace as well. Using the same query as example 1 (with addition of USE INDEX hint):

SELECT DISTINCT route.destinationairport
FROM `travel-sample` airport JOIN `travel-sample` route USE INDEX(route_airports)
    ON airport.faa = route.sourceairport AND route.type = "route"
WHERE airport.type = "airport" AND airport.city = "San Francisco" AND airport.country = "United States";

SELECT DISTINCT route.destinationairport

FROM `travel-sample` airport JOIN `travel-sample` route USE INDEX(route_airports)

ON airport.faa = route.sourceairport AND route.type = "route"

WHERE airport.type = "airport" AND airport.city = "San Francisco" AND airport.country = "United States";

The USE INDEX hint limits the number of indexes the planner needs to consider for performing the join.

Hints can also be specified on the left-hand side keyspace of ANSI JOIN.

SELECT DISTINCT route.destinationairport
FROM `travel-sample` airport USE INDEX(airport_city_country)
  JOIN `travel-sample` route USE INDEX(route_airports)
    ON airport.faa = route.sourceairport AND route.type = "route"
WHERE airport.type = "airport" AND airport.city = "San Francisco" AND airport.country = "United States";

SELECT DISTINCT route.destinationairport

FROM `travel-sample` airport USE INDEX(airport_city_country)

JOIN `travel-sample` route USE INDEX(route_airports)

ON airport.faa = route.sourceairport AND route.type = "route"

WHERE airport.type = "airport" AND airport.city = "San Francisco" AND airport.country = "United States";

Example 7: ANSI LEFT OUTER JOIN

So far we’ve been looking at inner joins. You can also perform LEFT OUTER JOIN by just including LEFT or LEFT OUTER keywords in front of JOIN keyword in join specification.

Required index (same as example 1):

CREATE INDEX route_airports ON `travel-sample`(sourceairport, destinationairport) WHERE type = "route";

1	CREATE INDEX route_airports ON `travel-sample`(sourceairport, destinationairport) WHERE type = "route";

Optional index (same as example 1):

CREATE INDEX airport_city_country ON `travel-sample`(city, country) WHERE type = "airport";

1	CREATE INDEX airport_city_country ON `travel-sample`(city, country) WHERE type = "airport";

Query:

SELECT airport.airportname, route.airlineid
FROM `travel-sample` airport LEFT JOIN `travel-sample` route
    ON airport.faa = route.sourceairport AND route.type = "route"
WHERE airport.type = "airport" AND airport.city = "Denver" AND airport.country = "United States";

SELECT airport.airportname, route.airlineid

FROM `travel-sample` airport LEFT JOIN `travel-sample` route

ON airport.faa = route.sourceairport AND route.type = "route"

WHERE airport.type = "airport" AND airport.city = "Denver" AND airport.country = "United States";

The result set for this query contains all the joined results, as well as any left-hand side (“airport”) document that does not join with the right-hand side (“route”) document, according to semantics of LEFT OUTER JOIN. Thus you’ll find results that just contain airport.airportname but not route.airlineid (which is missing). You can also select just the left-hand side (“airport”) document that does not join with right-hand side (“route”) document by adding a IS MISSING predicate on the right-hand side keyspace (“route”):

SELECT airport.airportname, route.airlineid
FROM `travel-sample` airport LEFT JOIN `travel-sample` route
     ON airport.faa = route.sourceairport AND route.type = "route"
WHERE airport.type = "airport" AND airport.city = "Denver" AND airport.country = "United States"
  AND route.airlineid IS MISSING;

SELECT airport.airportname, route.airlineid

FROM `travel-sample` airport LEFT JOIN `travel-sample` route

ON airport.faa = route.sourceairport AND route.type = "route"

WHERE airport.type = "airport" AND airport.city = "Denver" AND airport.country = "United States"

AND route.airlineid IS MISSING;

Example 8: ANSI RIGHT OUTER JOIN

ANSI RIGHT OUTER JOIN is similar to ANSI LEFT OUTER JOIN except we preserve the right-hand side document if no join occurs. We can modify the query in example 7 by switching the left-hand side and right-hand side keyspaces, and replacing LEFT keyword with RIGHT keyword:

SELECT airport.airportname, route.airlineid
FROM `travel-sample` route RIGHT JOIN `travel-sample` airport
    ON airport.faa = route.sourceairport AND route.type = "route"
WHERE airport.type = "airport" AND airport.city = "Denver" AND airport.country = "United States";

SELECT airport.airportname, route.airlineid

FROM `travel-sample` route RIGHT JOIN `travel-sample` airport

ON airport.faa = route.sourceairport AND route.type = "route"

WHERE airport.type = "airport" AND airport.city = "Denver" AND airport.country = "United States";

Note that although we switched airport and route in join specification, the filter on route (now the left-hand side keyspace) still appears in the ON-clause of the join, since route is still on the subservient side in this outer join.

RIGHT OUTER JOIN is internally converted to LEFT OUTER JOIN.

If a query contains multiple joins, a RIGHT OUTER JOIN can only be the first join specified. Since N1QL only support linear joins, i.e., the right-hand side of each join must be a single keyspace, if a RIGHT OUTER JOIN is not the first join specified, then after converting to LEFT OUTER JOIN, the right-hand side of the join now contains multiple keyspaces, which is not supported. If you specify RIGHT OUTER JOIN in any position other than the first join, a syntax error will be returned.

Example 9: ANSI JOIN using Hash Join

N1QL supports two join methods for ANSI JOIN. The default join method for an ANSI JOIN is nested-loop join. The alternative is hash join. Hash join uses a hash table to match documents from both sides of the join. Hash join has a build side and a probe side, where each document from the build side is inserted into a hash table based on values of equi-join expression from the build side; subsequently each document from the probe side looks up from the hash table based on values of equi-join expression from the probe side. If a match is found then the join operation is performed.

Compared with nested-loop join, hash join can be more efficient when the join is large, e.g., when there are tens of thousand or more documents from the left-hand side of the join (after applying filters). If using nested-loop join, then for each document from the left-hand side an index scan needs to be performed on the right-hand side index. As the number of documents from the left-hand side increases, nested-loop join becomes less efficient.

For hash join, the smaller side of the join should be used for building the hash table, and the larger side of the join should be used for probing the hash table. It should be noted that hash join does require more memory than nested-loop join, since an in-memory hash table is required. The amount of memory required is proportional to the number of documents from the build side, as well as average size of each document.

Hash join is supported in enterprise edition only. To use hash join, a USE HASH hint must be specified on the right-hand side keyspace of ANSI JOIN. Using the same query as example 1:

SELECT DISTINCT route.destinationairport
FROM `travel-sample` airport JOIN `travel-sample` route USE HASH(build)
    ON airport.faa = route.sourceairport AND route.type = "route"
WHERE airport.type = "airport" AND airport.city = "San Jose" AND airport.country = "United States";

SELECT DISTINCT route.destinationairport

FROM `travel-sample` airport JOIN `travel-sample` route USE HASH(build)

ON airport.faa = route.sourceairport AND route.type = "route"

WHERE airport.type = "airport" AND airport.city = "San Jose" AND airport.country = "United States";

The USE HASH(build) hint directs the N1QL planner to perform hash join for the ANSI JOIN specified, and the right-hand side keyspace (“route”) is used on the build side of the hash join. Looking at the explain, there is a HashJoin operator:

{
   "#operator": "HashJoin",
   "build_aliases": [
       "route"
   ],
   "build_exprs": [
       "cover ((`route`.`sourceairport`))"
   ],
   "on_clause": "(((`airport`.`faa`) = cover ((`route`.`sourceairport`))) and (cover ((`route`.`type`)) = \"route\"))",
   "probe_exprs": [
       "(`airport`.`faa`)"
   ],
   "~child": {
       "#operator": "Sequence",
       "~children": [
           {
               "#operator": "IndexScan3",
               "as": "route",
               "covers": [
                   "cover ((`route`.`sourceairport`))",
                   "cover ((`route`.`destinationairport`))",
                   "cover ((meta(`route`).`id`))"
               ],
               "filter_covers": {
                   "cover ((`route`.`type`))": "route"
               },
               "index": "route_airports",
               "index_id": "f1f4b9fbe85e45fd",
               "keyspace": "travel-sample",
               "namespace": "default",
               "spans": [
                   {
                       "range": [
                           {
                               "inclusion": 0,
                               "low": "null"
                           }
                       ]
                   }
               ],
               "using": "gsi"
           }
       ]
   }
}

{

"#operator": "HashJoin",

"build_aliases": [

"route"

"build_exprs": [

"cover ((`route`.`sourceairport`))"

"on_clause": "(((`airport`.`faa`) = cover ((`route`.`sourceairport`))) and (cover ((`route`.`type`)) = \"route\"))",

"probe_exprs": [

"(`airport`.`faa`)"

"~child": {

"#operator": "Sequence",

"~children": [

{

"#operator": "IndexScan3",

"as": "route",

"covers": [

"cover ((`route`.`sourceairport`))",

"cover ((`route`.`destinationairport`))",

"cover ((meta(`route`).`id`))"

"filter_covers": {

"cover ((`route`.`type`))": "route"

"index": "route_airports",

"index_id": "f1f4b9fbe85e45fd",

"keyspace": "travel-sample",

"namespace": "default",

"spans": [

{

"range": [

{

"inclusion": 0,

"low": "null"

}

]

}

"using": "gsi"

}

]

}

The child operator (“~child”) for a HashJoin operator is always the build side of the hash join. For this query, it’s an index scan on the right-hand side keyspace “route”.

Note that for accessing the “route” document we can no longer use information from the left-hand side keyspace (“airport”) for index search key (look at the “spans” information in the explain section above). Unlike nested-loop join, the index scan on “route” is no longer tied to an individual document from the left-hand side, and thus no value from the “airport” document can be used as a search key for the index scan on “route”.

The USE HASH(build) hint used in the query above directs the planner to use the right-hand side keyspace as the build side of the hash join. You can also specify USE HASH(probe) hint to direct the planner to use the right-hand side keyspace as the probe side of the hash join.

SELECT DISTINCT route.destinationairport
FROM `travel-sample` airport JOIN `travel-sample` route USE HASH(probe)
    ON airport.faa = route.sourceairport AND route.type = "route"
WHERE airport.type = "airport" AND airport.city = "San Jose" AND airport.country = "United States";

SELECT DISTINCT route.destinationairport

FROM `travel-sample` airport JOIN `travel-sample` route USE HASH(probe)

ON airport.faa = route.sourceairport AND route.type = "route"

WHERE airport.type = "airport" AND airport.city = "San Jose" AND airport.country = "United States";

Looking at the explain, you’ll find the HashJoin operator:

{
   "#operator": "HashJoin",
   "build_aliases": [
       "airport"
   ],
   "build_exprs": [
       "(`airport`.`faa`)"
   ],
   "on_clause": "(((`airport`.`faa`) = cover ((`route`.`sourceairport`))) and (cover ((`route`.`type`)) = \"route\"))",
   "probe_exprs": [
       "cover ((`route`.`sourceairport`))"
   ],
   "~child": {
       "#operator": "Sequence",
       "~children": [
           {
               "#operator": "IntersectScan",
               "scans": [
                   {
                       "#operator": "IndexScan3",
                       "as": "airport",
                       "index": "airport_city_country",
                       "index_id": "8e782fd1b124eec3",
                       "index_projection": {
                           "primary_key": true
                       },
                       "keyspace": "travel-sample",
                       "namespace": "default",
                       "spans": [
                           {
                               "exact": true,
                               "range": [
                                   {
                                       "high": "\"San Jose\"",
                                       "inclusion": 3,
                                       "low": "\"San Jose\""
                                   },
                                   {
                                       "high": "\"United States\"",
                                       "inclusion": 3,
                                       "low": "\"United States\""
                                   }
                               ]
                           }
                       ],
                       "using": "gsi"
                   },
                   {
                       "#operator": "IndexScan3",
                       "as": "airport",
                       "index": "airport_faa",
                       "index_id": "c302afbf811470f5",
                       "index_projection": {
                           "primary_key": true
                       },
                       "keyspace": "travel-sample",
                       "namespace": "default",
                       "spans": [
                           {
                               "exact": true,
                               "range": [
                                   {
                                       "inclusion": 0,
                                       "low": "null"
                                   }
                               ]
                           }
                       ],
                       "using": "gsi"
                   }
               ]
           },
           {
               "#operator": "Fetch",
               "as": "airport",
               "keyspace": "travel-sample",
               "namespace": "default"
           }
       ]
   }
}

{

"#operator": "HashJoin",

"build_aliases": [

"airport"

"build_exprs": [

"(`airport`.`faa`)"

"on_clause": "(((`airport`.`faa`) = cover ((`route`.`sourceairport`))) and (cover ((`route`.`type`)) = \"route\"))",

"probe_exprs": [

"cover ((`route`.`sourceairport`))"

"~child": {

"#operator": "Sequence",

"~children": [

{

"#operator": "IntersectScan",

"scans": [

{

"#operator": "IndexScan3",

"as": "airport",

"index": "airport_city_country",

"index_id": "8e782fd1b124eec3",

"index_projection": {

"primary_key": true

"keyspace": "travel-sample",

"namespace": "default",

"spans": [

{

"exact": true,

"range": [

{

"high": "\"San Jose\"",

"inclusion": 3,

"low": "\"San Jose\""

{

"high": "\"United States\"",

"inclusion": 3,

"low": "\"United States\""

}

]

}

"using": "gsi"

{

"#operator": "IndexScan3",

"as": "airport",

"index": "airport_faa",

"index_id": "c302afbf811470f5",

"index_projection": {

"primary_key": true

"keyspace": "travel-sample",

"namespace": "default",

"spans": [

{

"exact": true,

"range": [

{

"inclusion": 0,

"low": "null"

}

]

}

"using": "gsi"

}

]

{

"#operator": "Fetch",

"as": "airport",

"keyspace": "travel-sample",

"namespace": "default"

}

]

}

The child operator (“~child”) for HashJoin is an intersect index scan on the left-hand side keyspace of the ANSI JOIN, “airport”, followed by a fetch operator.

The USE HASH hint can only be specified on the right-hand side keyspace in an ANSI JOIN. Therefore depending on whether you want the right-hand side keyspace to be the build side or the probe side of a hash join, a USE HASH(build) or USE HASH(probe) hint should be specified on the right-hand side keyspace.

Hash join is only considered when USE HASH(build) or USE HASH(probe) hint is specified. Hash join requires equality join predicates to work. Nested-loop join requires an appropriate secondary index on the right-hand side keyspace, hash join does not (a primary index scan is an option for hash join). However, hash join does require more memory than nested-loop join since an in-memory hash table is required for hash join to work. In addition, hash join is considered a “blocking” operation, meaning the query engine must finish building the hash table before it can produce the first join result, thus for queries needing only the first few results quickly (e.g. with a LIMIT clause) hash join may not be the best fit.

If a USE HASH hint is specified, but a hash join cannot be generated successfully (e.g., lack of equality join predicates), then a nested-loop join will be considered.

Example 10: ANSI JOIN with multiple hints

You can now specify multiple hints for a keyspace on the right-hand side of an ANSI JOIN. For example, USE HASH hint can be used together with USE INDEX hint.

SELECT DISTINCT route.destinationairport
FROM `travel-sample` airport JOIN `travel-sample` route USE HASH(probe) INDEX(route_airports)
    ON airport.faa = route.sourceairport AND route.type = "route"
WHERE airport.type = "airport" AND airport.city = "San Jose" AND airport.country = "United States";

SELECT DISTINCT route.destinationairport

FROM `travel-sample` airport JOIN `travel-sample` route USE HASH(probe) INDEX(route_airports)

ON airport.faa = route.sourceairport AND route.type = "route"

WHERE airport.type = "airport" AND airport.city = "San Jose" AND airport.country = "United States";

Note when multiple hints are used together, you only need to specify the “USE” keyword once, as in the example above.

USE HASH hint can also be combined with USE KEYS hint.

Example 11: ANSI JOIN with multiple joins

ANSI JOIN can be chained together. For example:

Required indexes (route_airports index same as example 1):

CREATE INDEX route_airports ON `travel-sample`(sourceairport, destinationairport) WHERE type = "route";
CREATE INDEX airline_iata ON `travel-sample`(iata) WHERE type = "airline";

1 2	CREATE INDEX route_airports ON `travel-sample`(sourceairport, destinationairport) WHERE type = "route"; CREATE INDEX airline_iata ON `travel-sample`(iata) WHERE type = "airline";

Optional index (same as example 1):

CREATE INDEX airport_city_country ON `travel-sample`(city, country) WHERE type = "airport";

1	CREATE INDEX airport_city_country ON `travel-sample`(city, country) WHERE type = "airport";

Query:

SELECT DISTINCT airline.name
FROM `travel-sample` airport INNER JOIN `travel-sample` route
    ON airport.faa = route.sourceairport AND route.type = "route"
  INNER JOIN `travel-sample` airline
    ON route.airline = airline.iata AND airline.type = "airline"
WHERE airport.type = "airport" AND airport.city = "San Jose"
 AND airport.country = "United States";

SELECT DISTINCT airline.name

FROM `travel-sample` airport INNER JOIN `travel-sample` route

ON airport.faa = route.sourceairport AND route.type = "route"

INNER JOIN `travel-sample` airline

ON route.airline = airline.iata AND airline.type = "airline"

WHERE airport.type = "airport" AND airport.city = "San Jose"

AND airport.country = "United States";

Since there is no USE HASH hint specified in the query the explain should show two NestedLoopJoin operators.

You can mix hash join with nested-loop join by adding USE HASH hint to any of the joins in a chain of ANSI JOINs.

SELECT DISTINCT airline.name
FROM `travel-sample` airport INNER JOIN `travel-sample` route
     ON airport.faa = route.sourceairport AND route.type = "route"
   INNER JOIN `travel-sample` airline USE HASH(build)
     ON route.airline = airline.iata AND airline.type = "airline"
WHERE airport.type = "airport" AND airport.city = "San Jose"
  AND airport.country = "United States";

SELECT DISTINCT airline.name

FROM `travel-sample` airport INNER JOIN `travel-sample` route

ON airport.faa = route.sourceairport AND route.type = "route"

INNER JOIN `travel-sample` airline USE HASH(build)

ON route.airline = airline.iata AND airline.type = "airline"

WHERE airport.type = "airport" AND airport.city = "San Jose"

AND airport.country = "United States";

SELECT DISTINCT airline.name
FROM `travel-sample` airport INNER JOIN `travel-sample` route USE HASH(probe)
    ON airport.faa = route.sourceairport AND route.type = "route"
  INNER JOIN `travel-sample` airline
    ON route.airline = airline.iata AND airline.type = "airline"
WHERE airport.type = "airport" AND airport.city = "San Jose"
 AND airport.country = "United States";

SELECT DISTINCT airline.name

FROM `travel-sample` airport INNER JOIN `travel-sample` route USE HASH(probe)

ON airport.faa = route.sourceairport AND route.type = "route"

INNER JOIN `travel-sample` airline

ON route.airline = airline.iata AND airline.type = "airline"

WHERE airport.type = "airport" AND airport.city = "San Jose"

AND airport.country = "United States";

The visual explain for the last query is follows:

As mentioned before, N1QL only supports linear joins, i.e., the right-hand side of each join must be a keyspace.

Example 12: ANSI JOIN involving right-hand side arrays

Although ANSI JOIN comes from SQL standard, since Couchbase N1QL handles JSON documents and array is an important aspect of JSON, we extended ANSI JOIN support to arrays as well.

For examples in array handling please create a bucket “default” and insert the following documents:

INSERT INTO default (KEY,VALUE) VALUES("test11_ansijoin", {"c11": 1, "c12": 10, "a11": [ 1, 2, 3, 4 ], "type": "left"}),
                                VALUES("test12_ansijoin", {"c11": 2, "c12": 20, "a11": [ 3, 3, 5, 10 ], "type": "left"}),
                                VALUES("test13_ansijoin", {"c11": 3, "c12": 30, "a11": [ 3, 4, 20, 40 ], "type": "left"}),
                                VALUES("test14_ansijoin", {"c11": 4, "c12": 40, "a11": [ 30, 30, 30 ], "type": "left"});
INSERT INTO default (KEY,VALUE) VALUES("test21_ansijoin", {"c21": 1, "c22": 10, "a21": [ 1, 10, 20], "a22": [ 1, 2, 3, 4 ], "type": "right"}),
                                VALUES("test22_ansijoin", {"c21": 2, "c22": 20, "a21": [ 2, 3, 30], "a22": [ 3, 5, 10, 3 ], "type": "right"}),
                                VALUES("test23_ansijoin", {"c21": 2, "c22": 21, "a21": [ 2, 20, 30], "a22": [ 3, 3, 5, 10 ], "type": "right"}),
                                VALUES("test24_ansijoin", {"c21": 3, "c22": 30, "a21": [ 3, 10, 30], "a22": [ 3, 4, 20, 40 ], "type": "right"}),
                                VALUES("test25_ansijoin", {"c21": 3, "c22": 31, "a21": [ 3, 20, 40], "a22": [ 4, 3, 40, 20 ], "type": "right"}),
                                VALUES("test26_ansijoin", {"c21": 3, "c22": 32, "a21": [ 4, 14, 24], "a22": [ 40, 20, 4, 3 ], "type": "right"}),
                                VALUES("test27_ansijoin", {"c21": 5, "c22": 50, "a21": [ 5, 15, 25], "a22": [ 1, 2, 3, 4 ], "type": "right"}),
                                VALUES("test28_ansijoin", {"c21": 6, "c22": 60, "a21": [ 6, 16, 26], "a22": [ 3, 3, 5, 10 ], "type": "right"}),
                                VALUES("test29_ansijoin", {"c21": 7, "c22": 70, "a21": [ 7, 17, 27], "a22": [ 30, 30, 30 ], "type": "right"}),
                                VALUES("test30_ansijoin", {"c21": 8, "c22": 80, "a21": [ 8, 18, 28], "a22": [ 30, 30, 30 ], "type": "right"});

INSERT INTO default (KEY,VALUE) VALUES("test11_ansijoin", {"c11": 1, "c12": 10, "a11": [ 1, 2, 3, 4 ], "type": "left"}),

VALUES("test12_ansijoin", {"c11": 2, "c12": 20, "a11": [ 3, 3, 5, 10 ], "type": "left"}),

VALUES("test13_ansijoin", {"c11": 3, "c12": 30, "a11": [ 3, 4, 20, 40 ], "type": "left"}),

VALUES("test14_ansijoin", {"c11": 4, "c12": 40, "a11": [ 30, 30, 30 ], "type": "left"});

INSERT INTO default (KEY,VALUE) VALUES("test21_ansijoin", {"c21": 1, "c22": 10, "a21": [ 1, 10, 20], "a22": [ 1, 2, 3, 4 ], "type": "right"}),

VALUES("test22_ansijoin", {"c21": 2, "c22": 20, "a21": [ 2, 3, 30], "a22": [ 3, 5, 10, 3 ], "type": "right"}),

VALUES("test23_ansijoin", {"c21": 2, "c22": 21, "a21": [ 2, 20, 30], "a22": [ 3, 3, 5, 10 ], "type": "right"}),

VALUES("test24_ansijoin", {"c21": 3, "c22": 30, "a21": [ 3, 10, 30], "a22": [ 3, 4, 20, 40 ], "type": "right"}),

VALUES("test25_ansijoin", {"c21": 3, "c22": 31, "a21": [ 3, 20, 40], "a22": [ 4, 3, 40, 20 ], "type": "right"}),

VALUES("test26_ansijoin", {"c21": 3, "c22": 32, "a21": [ 4, 14, 24], "a22": [ 40, 20, 4, 3 ], "type": "right"}),

VALUES("test27_ansijoin", {"c21": 5, "c22": 50, "a21": [ 5, 15, 25], "a22": [ 1, 2, 3, 4 ], "type": "right"}),

VALUES("test28_ansijoin", {"c21": 6, "c22": 60, "a21": [ 6, 16, 26], "a22": [ 3, 3, 5, 10 ], "type": "right"}),

VALUES("test29_ansijoin", {"c21": 7, "c22": 70, "a21": [ 7, 17, 27], "a22": [ 30, 30, 30 ], "type": "right"}),

VALUES("test30_ansijoin", {"c21": 8, "c22": 80, "a21": [ 8, 18, 28], "a22": [ 30, 30, 30 ], "type": "right"});

Then create the following indexes:

CREATE INDEX default_ix_left on default(c11, DISTINCT a11) WHERE type = "left";
CREATE INDEX default_ix_right on default(c21, DISTINCT a21) WHERE type = "right";

1 2	CREATE INDEX default_ix_left on default(c11, DISTINCT a11) WHERE type = "left"; CREATE INDEX default_ix_right on default(c21, DISTINCT a21) WHERE type = "right";

When the join predicate involves an array on the right-hand side of ANSI JOIN, you need to create an array index on the right-hand side keyspace.

Query:

SELECT b1.c11, b2.c21, b2.c22
FROM default b1 JOIN default b2
    ON b2.c21 = b1.c11 AND ANY v IN b2.a21 SATISFIES v = b1.c12 END AND b2.type = "right"
WHERE b1.type = "left";

SELECT b1.c11, b2.c21, b2.c22

FROM default b1 JOIN default b2

ON b2.c21 = b1.c11 AND ANY v IN b2.a21 SATISFIES v = b1.c12 END AND b2.type = "right"

WHERE b1.type = "left";

Note that part of the join condition is an ANY clause which specifies that the left-hand side field b1.c12 can match any element of the right-hand side array b2.a21. For this join to work properly, we need an array index on b2.a21, e.g., default_ix_right index created above.

The explain plan shows a NestedLoopJoin, with the child operator being a distinct scan on the array index default_ix_right.

{
   "#operator": "NestedLoopJoin",
   "alias": "b2",
   "on_clause": "((((`b2`.`c21`) = (`b1`.`c11`)) and any `v` in (`b2`.`a21`) satisfies (`v` = (`b1`.`c12`)) end) and ((`b2`.`type`) = \"right\"))",
   "~child": {
       "#operator": "Sequence",
       "~children": [
           {
               "#operator": "DistinctScan",
               "scan": {
                   "#operator": "IndexScan3",
                   "as": "b2",
                   "index": "default_ix_right",
                   "index_id": "ef4e7fa33f33dce",
                   "index_projection": {
                       "primary_key": true
                   },
                   "keyspace": "default",
                   "namespace": "default",
                   "nested_loop": true,
                   "spans": [
                       {
                           "exact": true,
                           "range": [
                               {
                                   "high": "(`b1`.`c11`)",
                                   "inclusion": 3,
                                   "low": "(`b1`.`c11`)"
                               },
                               {
                                   "high": "(`b1`.`c12`)",
                                   "inclusion": 3,
                                   "low": "(`b1`.`c12`)"
                               }
                           ]
                       }
                   ],
                   "using": "gsi"
               }
           },
           {
               "#operator": "Fetch",
               "as": "b2",
               "keyspace": "default",
               "namespace": "default",
               "nested_loop": true
           }
       ]
   }
}

{

"#operator": "NestedLoopJoin",

"alias": "b2",

"on_clause": "((((`b2`.`c21`) = (`b1`.`c11`)) and any `v` in (`b2`.`a21`) satisfies (`v` = (`b1`.`c12`)) end) and ((`b2`.`type`) = \"right\"))",

"~child": {

"#operator": "Sequence",

"~children": [

{

"#operator": "DistinctScan",

"scan": {

"#operator": "IndexScan3",

"as": "b2",

"index": "default_ix_right",

"index_id": "ef4e7fa33f33dce",

"index_projection": {

"primary_key": true

"keyspace": "default",

"namespace": "default",

"nested_loop": true,

"spans": [

{

"exact": true,

"range": [

{

"high": "(`b1`.`c11`)",

"inclusion": 3,

"low": "(`b1`.`c11`)"

{

"high": "(`b1`.`c12`)",

"inclusion": 3,

"low": "(`b1`.`c12`)"

}

]

}

"using": "gsi"

}

{

"#operator": "Fetch",

"as": "b2",

"keyspace": "default",

"namespace": "default",

"nested_loop": true

}

]

}

Example 13: ANSI JOIN involving left-hand side arrays

If ANSI JOIN involves an array on the left-hand side, then there are two options for performing the join.

Option 1: use UNNEST

Use UNNEST clause to flatten the left-hand side array first before performing the join.

SELECT b1.c11, b2.c21, b2.c22
FROM default b1 UNNEST b1.a11 AS ba1
    JOIN default b2 ON ba1 = b2.c21 AND b2.type = "right"
WHERE b1.c11 = 2 AND b1.type = "left";

SELECT b1.c11, b2.c21, b2.c22

FROM default b1 UNNEST b1.a11 AS ba1

JOIN default b2 ON ba1 = b2.c21 AND b2.type = "right"

WHERE b1.c11 = 2 AND b1.type = "left";

After the UNNEST the array becomes individual fields, and the subsequent join is just like a “regular” ANSI JOIN with fields from both sides.

Option 2: use IN-clause

Alternatively, use IN-clause as join condition.

SELECT b1.c11, b2.c21, b2.c22
FROM default b1 JOIN default b2
    ON b2.c21 IN b1.a11 AND b2.type = "right"
WHERE b1.c11 = 2 AND b1.type = "left";

SELECT b1.c11, b2.c21, b2.c22

FROM default b1 JOIN default b2

ON b2.c21 IN b1.a11 AND b2.type = "right"

WHERE b1.c11 = 2 AND b1.type = "left";

The IN-clause is satisfied when any element of the array on the left-hand side keyspace (“b1.a11”) matches the right-hand side field (“b2.c21”).

Note that there is a semantics difference between the two options. When there are duplicates in the array, the UNNEST option does not care about duplicates and will flatten the left-hand side documents to as many documents as number of elements in the array, thus may produce duplicated results; the IN-clause option will not produce duplicated results if there are duplicated elements in the array. In addition, when LEFT OUTER JOIN is performed, there may be a different number of preserved documents due to the flattening of the array with the UNNEST option. Thus the user is advised to pick the option that reflect the semantics needed for the query.

Example 14: ANSI JOIN involving arrays on both sides

Although uncommon, it is possible to perform an ANSI JOIN when both sides of the join are arrays. In such cases, you can use a combination of the techniques described above. Use array index to handle array on the right-hand side, and use either UNNEST option or IN-clause option to handle array on the left-hand side.

Option 1: use UNNEST clause

SELECT b1.c11, b2.c21, b2.c22
FROM default b1 UNNEST b1.a11 AS ba1
    JOIN default b2 ON b2.c21 = b1.c11 AND ANY v IN b2.a21 SATISFIES v = ba1 END AND b2.type = "right"
WHERE b1.type = "left";

SELECT b1.c11, b2.c21, b2.c22

FROM default b1 UNNEST b1.a11 AS ba1

JOIN default b2 ON b2.c21 = b1.c11 AND ANY v IN b2.a21 SATISFIES v = ba1 END AND b2.type = "right"

WHERE b1.type = "left";

Option 2: use IN-clause

SELECT b1.c11, b2.c21, b2.c22
FROM default b1 JOIN default b2
    ON b2.c21 = b1.c11 AND ANY v IN b2.a21 SATISFIES v IN b1.a11 END AND b2.type = "right"
WHERE b1.type = "left";

SELECT b1.c11, b2.c21, b2.c22

FROM default b1 JOIN default b2

ON b2.c21 = b1.c11 AND ANY v IN b2.a21 SATISFIES v IN b1.a11 END AND b2.type = "right"

WHERE b1.type = "left";

Again the two options are not semantically identical, and may give different results. Pick the option that reflects the semantics desired.

Example 15: lookup join migration

N1QL will continue to support lookup join and index join for backward compatibility, however, you cannot mix ANSI JOIN with lookup join or index join in the same query. You can convert your existing queries from using lookup join and index join to the ANSI JOIN syntax. This example shows you how to convert a lookup join into ANSI JOIN syntax.

Create the following index to speed up the query (same as example 1):

CREATE INDEX route_airports ON `travel-sample`(sourceairport, destinationairport) WHERE type = "route";

1	CREATE INDEX route_airports ON `travel-sample`(sourceairport, destinationairport) WHERE type = "route";

This is a query using lookup join syntax (note the ON KEYS clause):

SELECT airline.name
FROM `travel-sample` route JOIN `travel-sample` airline
    ON KEYS route.airlineid
WHERE route.type = "route" AND route.sourceairport = "SFO" AND route.destinationairport = "JFK";

SELECT airline.name

FROM `travel-sample` route JOIN `travel-sample` airline

ON KEYS route.airlineid

WHERE route.type = "route" AND route.sourceairport = "SFO" AND route.destinationairport = "JFK";

In lookup join the left-hand side of the join (“route”) needs to produce document keys for the right-hand side of the join (“airline”), this is achieved by the ON KEYS clause. The join condition (which is implied from the syntax) is route.airlineid = meta(airline).id, thus the same query can be specified using ANSI JOIN syntax:

SELECT airline.name
FROM `travel-sample` route JOIN `travel-sample` airline
    ON route.airlineid = meta(airline).id
WHERE route.type = "route" AND route.sourceairport = "SFO" AND route.destinationairport = "JFK";

SELECT airline.name

FROM `travel-sample` route JOIN `travel-sample` airline

ON route.airlineid = meta(airline).id

WHERE route.type = "route" AND route.sourceairport = "SFO" AND route.destinationairport = "JFK";

In this example the ON KEYS clause contain a single document key. It’s possible for the ON KEYS clause to contain an array of document keys, in which case the converted ON clause will be in the form of an IN clause instead of an equality clause. Let’s assume each route document has an array of document keys for airline, then the original ON KEYS clause:

    ON KEYS route.airlineids

1	ON KEYS route.airlineids

can be converted to:

    ON meta(airline).id IN route.airlineids

1	ON meta(airline).id IN route.airlineids

Example 16: index join migration

This example shows you how to convert an index join into ANSI JOIN syntax.

Required index (same as example 3):

CREATE INDEX route_airlineid ON `travel-sample`(airlineid) WHERE type = "route";

1	CREATE INDEX route_airlineid ON `travel-sample`(airlineid) WHERE type = "route";

Optional index (same as example 3):

CREATE INDEX airline_name ON `travel-sample`(name) WHERE type = "airline";

1	CREATE INDEX airline_name ON `travel-sample`(name) WHERE type = "airline";

Query using index join syntax (note the ON KEY … FOR … clause):

SELECT count(*)
FROM `travel-sample` airline JOIN `travel-sample` route
    ON KEY route.airlineid FOR airline
WHERE airline.type = "airline" AND route.type = "route" AND airline.name = "United Airlines";

SELECT count(*)

FROM `travel-sample` airline JOIN `travel-sample` route

ON KEY route.airlineid FOR airline

WHERE airline.type = "airline" AND route.type = "route" AND airline.name = "United Airlines";

In index join the document key for left-hand side (“airline”) is used to probe an index on an expression (“route.airlineid” which appears in the ON KEY clause) from the right-hand side (“route”) that corresponds to the document key for the left-hand side (“airline” which appears in the FOR clause). The join condition (implied from syntax) is route.airlineid = meta(airline).id, thus the same query can be specified using ANSI JOIN syntax:

SELECT count(*)
FROM `travel-sample` airline JOIN `travel-sample` route
    ON route.airlineid = meta(airline).id
WHERE airline.type = "airline" AND route.type = "route" AND airline.name = "United Airlines";

SELECT count(*)

FROM `travel-sample` airline JOIN `travel-sample` route

ON route.airlineid = meta(airline).id

WHERE airline.type = "airline" AND route.type = "route" AND airline.name = "United Airlines";

Example 17: ANSI NEST

Couchbase N1QL supports NEST operation. Previously NEST can be done using lookup nest or index nest, similar to lookup join and index join, respectively. With ANSI JOIN support, NEST operation can also be done using similar syntax, i.e., using ON clause instead of ON KEYS (lookup nest) or ON KEY … FOR … (index nest) clauses. This new variant is referred to as ANSI NEST.

Required index (route_airports index same as example 1, route_airline_distance index same as example 4):

CREATE INDEX route_airports ON `travel-sample`(sourceairport, destinationairport) WHERE type = "route";
CREATE INDEX route_airline_distance ON `travel-sample`(airline, distance) WHERE type = "route";

1 2	CREATE INDEX route_airports ON `travel-sample`(sourceairport, destinationairport) WHERE type = "route"; CREATE INDEX route_airline_distance ON `travel-sample`(airline, distance) WHERE type = "route";

Optional index:

CREATE INDEX airline_country_iata_name ON `travel-sample`(country, iata, name) WHERE type = "airline";

1	CREATE INDEX airline_country_iata_name ON `travel-sample`(country, iata, name) WHERE type = "airline";

Query:

SELECT airline.name, ARRAY {"destination": r.destinationairport} FOR r in route END as destinations
FROM `travel-sample` airline NEST `travel-sample` route
    ON airline.iata = route.airline AND route.type = "route" AND route.sourceairport = "SFO"
WHERE airline.type = "airline" AND airline.country = "United States";

SELECT airline.name, ARRAY {"destination": r.destinationairport} FOR r in route END as destinations

FROM `travel-sample` airline NEST `travel-sample` route

ON airline.iata = route.airline AND route.type = "route" AND route.sourceairport = "SFO"

WHERE airline.type = "airline" AND airline.country = "United States";

As you can see the syntax for ANSI NEST is very similar to that of ANSI JOIN. There is one peculiar property for nest though. By definition the nest operation creates an array of all matching right-hand side document for each left-hand side document, which means the reference to the right-hand side keyspace, “route” in this query, has different meaning depending on where the reference is. The ON-clause is evaluated as part of the NEST operation, and thus references to “route” is referencing a single document. In contrast, references in the projection clause, or the WHERE clause, are evaluated after the NEST operation, and thus references to “route” means the nested array, thus it should be treated as an array. Notice the projection clause of the query above has an ARRAY construct with a FOR clause to access each individual document within the array (i.e., the reference to “route” is now in an array context).

Summary

ANSI JOIN provides much more flexibility in join operations in Couchbase N1QL, compared to previously supported lookup join and index join, both of which require joining on document key only. The examples above show various ways you can use ANSI JOIN in queries. Since ANSI JOIN is widely used in the relational world, the support for ANSI JOIN in Couchbase N1QL should make it much easier to migrate applications from a relational database to Couchbase N1QL.

Bingjie Miao, Principal Software Engineer Couchbase

Platform

Self-Managed

Services

Capabilities

Why Couchbase?

Migrate to Capella

By Use Case

By Industry

By Application Need

Popular Docs

By Developer Role

Quickstart

Resource Center

About

Partnerships

Our Services

Partners: Register a Deal

Ready to register a deal with Couchbase?

Marriott

All Posts

ANSI JOIN Support in N1QL

Overview

Details of ANSI JOIN support

Example 1: ANSI JOIN with arbitrary join condition

Example 2: ANSI JOIN with multiple join conditions

Example 3: ANSI JOIN with complex join expressions

Example 4: ANSI JOIN with IN clause

Example 5: ANSI JOIN with OR clause

Example 6: ANSI JOIN with hints

Example 7: ANSI LEFT OUTER JOIN

Example 8: ANSI RIGHT OUTER JOIN

Example 9: ANSI JOIN using Hash Join

Example 10: ANSI JOIN with multiple hints

Example 11: ANSI JOIN with multiple joins

Example 12: ANSI JOIN involving right-hand side arrays

Example 13: ANSI JOIN involving left-hand side arrays

Option 1: use UNNEST

Option 2: use IN-clause

Example 14: ANSI JOIN involving arrays on both sides

Option 1: use UNNEST clause

Option 2: use IN-clause

Example 15: lookup join migration

Example 16: index join migration

Example 17: ANSI NEST

Summary

Author

Posted by Bingjie Miao, Principal Software Engineer Couchbase

2 Comments

Leave a reply Cancel reply