It is extremely likely that you have come across issues with hierarchical lookups or graph traversal in your application as a developer who handles real-world use cases. And, for obvious reasons, you prefer solving them at the database layer and not at the client-side.

What are hierarchical lookups?

Hierarchical lookup refers to the process of searching and retrieving data from a hierarchical structure, such as a tree or a parent-child relationship. It involves navigating through the levels or layers of the hierarchy to locate specific data elements or related information, for example, organizational charts, file systems, and category trees.

Graphs are basically trees that may have cycles, so additionally, we could also cover use cases like path finding, etc provided we have additional configurations to deal with cycles.

hierachical lookups

Consider an example using an employee collection with a hierarchical structure. Employees are organized in a manager-subordinate relationship. We’ll perform operations on this employee hierarchy using Couchbase and JavaScript UDFs to demonstrate the solution.

Going from the Employee Collection we can retrieve:

i) generic employee-level:

ii) hierarchy for each employee:

What this blog covers

You really like the flexibility of NoSQL databases, and your underlying data model is JSON, which is why you have chosen Couchbase for your database needs (great choice, btw :-) ).

Now that you have Couchbase as your database, you want to solve the above mentioned problem. After doing some research in the SQL++ Language Reference docs you have come to the conclusion that there is no Statement or Function support in SQL++ to solve our problem, like the graphLookup API in MongoDB or recursive CTE in SQL databases.

But with existing infrastructure, users can emulate recursive CTE using JavaScript UDFs. The following sections will demonstrate just how to do that and more!

Solution to recursive queries

To implement this solution, we need to create a JavaScript UDF that utilizes a breadth-first search algorithm to traverse the employee hierarchy. The UDF takes an anchor query, a recursive query, and configuration options as parameters. The anchor query retrieves the initial set of employees, while the recursive query references the results of the previous iteration using the $1 parameter. Configuration options allow customization, including early exit criteria, arguments for inner queries, cycle detection, explain mode for query planning, and logging options.

How to use inner queries

The anchor query gets the root level documents from the target collection.

The recursive query can use a JOIN clause with one side as $1 and the other as the target collection, to exploit the parent-child relationship and perform the traversal from one level to another.

In our function, $1 does what the CTE alias does in recursive CTE.

To better understand how we use SQL++ (N1QL) within JavaScript UDFs, take a look at From N1QL to JavaScript and back.

Now let’s look at the process step by step. First of all, does recursive query processing really need to be a recursive function?

No, this would be a bad choice considering time-complexity (the same functionality in recursive might be exponential, but linear in an iterative approach) and all the other issues like frames, etc., that come along with recursion. If interested, check out this article on converting tail recursion functions to iterative. Also, JS UDFs have a preset max recursion depth of 128; this is not preferable for what we are trying to do.

To put it simply, any iterative task does the following:

We would like to acknowledge this article for its approach.

Points to ponder

How do we hold the state expression and pass it as a parameter to the query (recursive part here)?

    • We use the SQL++ ability to pass dynamic query parameters and use parameterized queries to hold state value across levels.
    • We use $1 in the recursive clause to refer previous iterations results, i.e., state expression

How can we optimize?

    • Prepare both anchor and recursive queries initially so we can reuse the query plans at execution time.
    • Find a way to look at the query plan( similar to an EXPLAIN statement) so we can generate appropriate indexes for inner statements (anchor and recursive).

The High-level Overview of the implementation is as follows:

Code for recursive CTE using JS UDFs 

This is a very naive implementation to get started and running. We set up the Breadth-First-Search algorithm, establishing the flow of order as per the overview above. Here is the code, from this Gist link:

How to add the function

Here’s a link to add this to a JavaScript library (say “mylibrary”) within your query service, and a link to create the UDF from the added library.

Create the UDF from the library using SQL++:

Usage

Argument 1: Anchor query, string
Argument 2: Recursive query, string (uses $1(state expression) to refer to the results of previous iteration )
Argument 3: Config options, object

All are mandatory, but you can pass an empty object, i.e., {} if you aren’t using any config options.


Config Options

Early exit

levelLimit(1 to N) – Specify the level at which we can stop. Level count starts at 0 for anchor results, 1 for first iteration/level, 2 for second, etc.

Arguments to inner query

anchorArgs – as named, e.g.,{“arg:1} or positional arguments, e.g., [1] to use in anchor query.

recursiveArgs – Can only be positional arguments, and 0 index must be reserved for state expression. For example: [0, 1] – always set 0th index ($1 arg) to 0, so we can use it in the recursive clause as state expression.

Cycle detection

cycleFields – Array of field names, e.g., [“_from”, “_to”]

Explain

explain – Check query plan in logs (sets log by default)

Logs

log – Display logs that help when you want to log while developing

 

Sample Queries


1. Count numbers 1 to N – simplest example of a query that shows how recursive CTE works.
    
Take a brief pause to think about how you could do this in N1QL without the function utility we developed now. Is it possible? (Or did we just make N1QL Turing complete!)

2. Employees example – refer to the employee collection we saw earlier.

A. Find the organizational level of an employee

B. Per employee reports to hierarchy

Anchor query:

This can look too verbose when adding a WHERE predicate from outer query.  We can use the anchorArgs option to mitigate this!

Arguments (anchorArgs/recursiveArgs)

Explain

Provide {“explain”: true} – you can see the query plan in the log field 

Early Exit

Set levelLimit like {“levelLimit”:2} – if you only want 2 levels of recursion

 

Cycle Detection

Without cycle detection the query to find employee level hierarchy will go into an infinite loop and crash on function timeout, wasting CPU resources.

Considering that when we work on graph data it is highly likely that we have cycles present and it is the responsibility of the caller of the function to to set cycle detection of appropriate fields.

Here, the “employee_id”, “manager_id”  pair is sufficient to exit on cycle.

Thank you for following on this challenging topic, we hope it reveals more options for some of your recursive querying challenges!

References

Author

Posted by Gaurav Jayaraj - Intern, Couchbase R&D

Gaurav Jayaraj is an intern in the Query team at Couchbase R&D. Gaurav is pursuing his Bachelors in Computer Science from PES University, Bangalore.

Leave a reply