Recursive Query Processing in SQL++ (N1QL) Function

It is extremely likely that you have come across issues with hierarchical lookups or graph traversal in your application as a developer who handles real-world use cases. And, for obvious reasons, you prefer solving them at the database layer and not at the client-side.

What are hierarchical lookups?

Hierarchical lookup refers to the process of searching and retrieving data from a hierarchical structure, such as a tree or a parent-child relationship. It involves navigating through the levels or layers of the hierarchy to locate specific data elements or related information, for example, organizational charts, file systems, and category trees.

Graphs are basically trees that may have cycles, so additionally, we could also cover use cases like path finding, etc provided we have additional configurations to deal with cycles.

Consider an example using an employee collection with a hierarchical structure. Employees are organized in a manager-subordinate relationship. We’ll perform operations on this employee hierarchy using Couchbase and JavaScript UDFs to demonstrate the solution.

Going from the Employee Collection we can retrieve:

i) generic employee-level:

ii) hierarchy for each employee:

What this blog covers

You really like the flexibility of NoSQL databases, and your underlying data model is JSON, which is why you have chosen Couchbase for your database needs (great choice, btw :-) ).

Now that you have Couchbase as your database, you want to solve the above mentioned problem. After doing some research in the SQL++ Language Reference docs you have come to the conclusion that there is no Statement or Function support in SQL++ to solve our problem, like the graphLookup API in MongoDB or recursive CTE in SQL databases.

But with existing infrastructure, users can emulate recursive CTE using JavaScript UDFs. The following sections will demonstrate just how to do that and more!

Solution to recursive queries

To implement this solution, we need to create a JavaScript UDF that utilizes a breadth-first search algorithm to traverse the employee hierarchy. The UDF takes an anchor query, a recursive query, and configuration options as parameters. The anchor query retrieves the initial set of employees, while the recursive query references the results of the previous iteration using the $1 parameter. Configuration options allow customization, including early exit criteria, arguments for inner queries, cycle detection, explain mode for query planning, and logging options.

How to use inner queries

The anchor query gets the root level documents from the target collection.

The recursive query can use a JOIN clause with one side as $1 and the other as the target collection, to exploit the parent-child relationship and perform the traversal from one level to another.

In our function, $1 does what the CTE alias does in recursive CTE.

To better understand how we use SQL++ (N1QL) within JavaScript UDFs, take a look at From N1QL to JavaScript and back.

Now let’s look at the process step by step. First of all, does recursive query processing really need to be a recursive function?

No, this would be a bad choice considering time-complexity (the same functionality in recursive might be exponential, but linear in an iterative approach) and all the other issues like frames, etc., that come along with recursion. If interested, check out this article on converting tail recursion functions to iterative. Also, JS UDFs have a preset max recursion depth of 128; this is not preferable for what we are trying to do.

To put it simply, any iterative task does the following:

We would like to acknowledge this article for its approach.

Points to ponder

How do we hold the state expression and pass it as a parameter to the query (recursive part here)?

- We use the SQL++ ability to pass dynamic query parameters and use parameterized queries to hold state value across levels.
- We use $1 in the recursive clause to refer previous iterations results, i.e., state expression

How can we optimize?

- Prepare both anchor and recursive queries initially so we can reuse the query plans at execution time.
- Find a way to look at the query plan( similar to an EXPLAIN statement) so we can generate appropriate indexes for inner statements (anchor and recursive).

The High-level Overview of the implementation is as follows:

Code for recursive CTE using JS UDFs

This is a very naive implementation to get started and running. We set up the Breadth-First-Search algorithm, establishing the flow of order as per the overview above. Here is the code, from this Gist link:

function recursive_cte(anchor , recursive, config) {
    let isLog = false;
    let anchorArgs=[];
    let recursiveArgs=[];
    let cycleFields=[];
    let hash;    
    let levelLimit = -1;
    let isExplain = false;

    let res = {"res":[], "log":[]}


    if(config!=undefined) {
        if(config.log!=undefined && config.log==true) {
            isLog = true;
        }

        if(config.anchorArgs!=undefined) {
            anchorArgs = config.anchorArgs;
        }

        if(config.recursiveArgs!=undefined) {
            recursiveArgs = config.recursiveArgs;
        }

        if(config.levelLimit!=undefined && config.levelLimit>0) {
            levelLimit = config.levelLimit;
        }

        if(config.cycleFields!=undefined && config.cycleFields.length>0) {
            res['log'].push("Got cycle fields "+ config.cycleFields)
            // for(const field of config.cycleFields) {
            //     cycleFields.push(field);
            // }
            cycleFields = config.cycleFields;
            res['log'].push(cycleFields);
            // init hash
            hash = createhash();

        }

        if(config.explain!=undefined) {
            isExplain = true;
        }

    }
    

    // init state
    recursiveArgs.push(0);
    
    // Prepare anchor statement
    let anchorPname;
    let anchorPlan;
    try{
        const anchor_prep = N1QL("PREPARE FORCE "+anchor);
        
        for(const ap of anchor_prep) {
            anchorPname = ap["name"];
            anchorPlan = ap["operator"];
        }

        res['log'].push("prepared anchor");
    }
    catch(err) {
        res['log'].push("couldn't prepare anchor");
        throw err;
    }

     // prepare recursive statement

     let recursivePname;
     let recursivePlan;
     try{
         const recursive_prep = N1QL("PREPARE FORCE "+recursive);
 
         for(const rp of recursive_prep) {
             recursivePname = rp["name"];
             recursivePlan = rp["operator"];
         }
         res['log'].push("prepared recursive");
     }
     catch(err) {
         res['log'].push("couldn't prepare recursive");
         throw err;
     }


    // state expression 
    let workSet = []

    // execute anchor 
    try{
        const anchorExec = N1QL("EXECUTE `"+anchorPname+"`",anchorArgs);
        for(const doc of anchorExec) {
            workSet.push(doc);
        }
    }
    catch(err) {
        res['log'].push("failed to execute anchor");
        throw err;
    }

    // cycle check
    if(cycleFields.length>0) {
        res['log'].push("cycle check on fields "+cycleFields)
        workSet = cycleCheck(cycleFields, hash, workSet);
    }

    // populate root level( level 0 )
    res['res'].push(...workSet);


    let level = 0;

    while(workSet.length!=0) {

        // exit on level condition
        if(levelLimit>0 && level>=levelLimit) {
            res['log'].push("Exit on level condition: levelLimit="+levelLimit.toString())
            break;
        }

        // execute recursive query
        let newWorkSet = []

        // set state $1
        recursiveArgs[0] = workSet;

        try{
            const recursiveExec = N1QL("EXECUTE `"+recursivePname+"`", recursiveArgs)

            // empty workSet to populate again
            for(const doc of recursiveExec) {
                newWorkSet.push(doc)
            }
        }
        catch(err){
            res['log'].push("failed execute recursive");
            throw err;
        }

        // cycle check
        if(cycleFields.length>0) {
            newWorkSet = cycleCheck(cycleFields, hash, newWorkSet);
        }

        if(newWorkSet.length==0)
            break;

        
        res["res"].push(...newWorkSet);

        // update state expression
        workSet = newWorkSet;

        level++;
    }

    if(isExplain){
        res['log'].push("Anchor Plan:");
        res['log'].push(anchorPlan);
        res['log'].push("Recursive Plan");
        res["log"].push(recursivePlan);
        return res;
    }

    return isLog?res:res['res'];
}


function createhash() {
    let h = {};

    const getVal = function(key) {
        return h[key]==undefined?false:true;
    }

    const setVal = function(key) {
        h[key] = true;
    }

    return {setVal, getVal};
}

function cycleCheck(cycleFields, hash, workSet) {
    let cycleTrim = [];
    for(const doc of workSet) {
        let key = [];
        for(const field of cycleFields) {
            if(doc[field]!=undefined){
                key.push(String(doc[field]));
            }
        }

        // create hashKey
        hashKey = key.join(String.fromCharCode(30));


        if(hash.getVal(hashKey)==true){
            continue;
        }
        else{
            hash.setVal(hashKey);
            cycleTrim.push(doc);
        }
    }

    return cycleTrim;
}

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

154

155

156

157

158

159

160

161

162

163

164

165

166

167

168

169

170

171

172

173

174

175

176

177

178

179

180

181

182

183

184

185

186

187

188

189

190

191

192

193

194

195

196

197

198

199

200

201

202

203

204

205

206

207

208

209

210

function recursive_cte(anchor , recursive, config) {

let isLog = false;

let anchorArgs=[];

let recursiveArgs=[];

let cycleFields=[];

let hash;

let levelLimit = -1;

let isExplain = false;

let res = {"res":[], "log":[]}

if(config!=undefined) {

if(config.log!=undefined && config.log==true) {

isLog = true;

}

if(config.anchorArgs!=undefined) {

anchorArgs = config.anchorArgs;

}

if(config.recursiveArgs!=undefined) {

recursiveArgs = config.recursiveArgs;

}

if(config.levelLimit!=undefined && config.levelLimit>0) {

levelLimit = config.levelLimit;

}

if(config.cycleFields!=undefined && config.cycleFields.length>0) {

res['log'].push("Got cycle fields "+ config.cycleFields)

// for(const field of config.cycleFields) {

// cycleFields.push(field);

// }

cycleFields = config.cycleFields;

res['log'].push(cycleFields);

// init hash

hash = createhash();

}

if(config.explain!=undefined) {

isExplain = true;

}

// init state

recursiveArgs.push(0);

// Prepare anchor statement

let anchorPname;

let anchorPlan;

try{

const anchor_prep = N1QL("PREPARE FORCE "+anchor);

for(const ap of anchor_prep) {

anchorPname = ap["name"];

anchorPlan = ap["operator"];

}

res['log'].push("prepared anchor");

}

catch(err) {

res['log'].push("couldn't prepare anchor");

throw err;

}

// prepare recursive statement

let recursivePname;

let recursivePlan;

try{

const recursive_prep = N1QL("PREPARE FORCE "+recursive);

for(const rp of recursive_prep) {

recursivePname = rp["name"];

recursivePlan = rp["operator"];

}

res['log'].push("prepared recursive");

}

catch(err) {

res['log'].push("couldn't prepare recursive");

throw err;

}

// state expression

let workSet = []

// execute anchor

try{

const anchorExec = N1QL("EXECUTE `"+anchorPname+"`",anchorArgs);

for(const doc of anchorExec) {

workSet.push(doc);

}

catch(err) {

res['log'].push("failed to execute anchor");

throw err;

}

// cycle check

if(cycleFields.length>0) {

res['log'].push("cycle check on fields "+cycleFields)

workSet = cycleCheck(cycleFields, hash, workSet);

}

// populate root level( level 0 )

res['res'].push(...workSet);

let level = 0;

while(workSet.length!=0) {

// exit on level condition

if(levelLimit>0 && level>=levelLimit) {

res['log'].push("Exit on level condition: levelLimit="+levelLimit.toString())

break;

}

// execute recursive query

let newWorkSet = []

// set state $1

recursiveArgs[0] = workSet;

try{

const recursiveExec = N1QL("EXECUTE `"+recursivePname+"`", recursiveArgs)

// empty workSet to populate again

for(const doc of recursiveExec) {

newWorkSet.push(doc)

}

catch(err){

res['log'].push("failed execute recursive");

throw err;

}

// cycle check

if(cycleFields.length>0) {

newWorkSet = cycleCheck(cycleFields, hash, newWorkSet);

}

if(newWorkSet.length==0)

break;

res["res"].push(...newWorkSet);

// update state expression

workSet = newWorkSet;

level++;

}

if(isExplain){

res['log'].push("Anchor Plan:");

res['log'].push(anchorPlan);

res['log'].push("Recursive Plan");

res["log"].push(recursivePlan);

return res;

}

return isLog?res:res['res'];

}

function createhash() {

let h = {};

const getVal = function(key) {

return h[key]==undefined?false:true;

}

const setVal = function(key) {

h[key] = true;

}

return {setVal, getVal};

}

function cycleCheck(cycleFields, hash, workSet) {

let cycleTrim = [];

for(const doc of workSet) {

let key = [];

for(const field of cycleFields) {

if(doc[field]!=undefined){

key.push(String(doc[field]));

}

// create hashKey

hashKey = key.join(String.fromCharCode(30));

if(hash.getVal(hashKey)==true){

continue;

}

else{

hash.setVal(hashKey);

cycleTrim.push(doc);

}

return cycleTrim;

}

How to add the function

Here’s a link to add this to a JavaScript library (say “mylibrary”) within your query service, and a link to create the UDF from the added library.

Create the UDF from the library using SQL++:

CREATE FUNCTION recursiveCte(anchor, recursive, config) 
LANGUAGE JAVASCRIPT as "recursive_cte" AT "mylibrary";

1 2	CREATE FUNCTION recursiveCte(anchor, recursive, config) LANGUAGE JAVASCRIPT as "recursive_cte" AT "mylibrary";

Usage

Argument 1: Anchor query, string
Argument 2: Recursive query, string (uses $1(state expression) to refer to the results of previous iteration )
Argument 3: Config options, object

All are mandatory, but you can pass an empty object, i.e., {} if you aren’t using any config options.

Config Options

Early exit

levelLimit(1 to N) – Specify the level at which we can stop. Level count starts at 0 for anchor results, 1 for first iteration/level, 2 for second, etc.

Arguments to inner query

anchorArgs – as named, e.g.,{“arg:1} or positional arguments, e.g., [1] to use in anchor query.

recursiveArgs – Can only be positional arguments, and 0 index must be reserved for state expression. For example: [0, 1] – always set 0th index ($1 arg) to 0, so we can use it in the recursive clause as state expression.

Cycle detection

cycleFields – Array of field names, e.g., [“_from”, “_to”]

Explain

explain – Check query plan in logs (sets log by default)

Logs

log – Display logs that help when you want to log while developing

Sample Queries

1. Count numbers 1 to N – simplest example of a query that shows how recursive CTE works.

Take a brief pause to think about how you could do this in N1QL without the function utility we developed now. Is it possible? (Or did we just make N1QL Turing complete!)

2. Employees example – refer to the employee collection we saw earlier.

A. Find the organizational level of an employee

B. Per employee reports to hierarchy

Anchor query:

SELECT e1.*, 0 as hlevel 
FROM `employees` e1 
WHERE e1.manager_id=" || to_str(e.employee_id)

SELECT e1.*, 0 as hlevel

FROM `employees` e1

WHERE e1.manager_id=" || to_str(e.employee_id)

This can look too verbose when adding a WHERE predicate from outer query. We can use the anchorArgs option to mitigate this!

Arguments (anchorArgs/recursiveArgs)

Explain

Provide {“explain”: true} – you can see the query plan in the log field

Early Exit

Set levelLimit like {“levelLimit”:2} – if you only want 2 levels of recursion

Cycle Detection

Without cycle detection the query to find employee level hierarchy will go into an infinite loop and crash on function timeout, wasting CPU resources.

Considering that when we work on graph data it is highly likely that we have cycles present and it is the responsibility of the caller of the function to to set cycle detection of appropriate fields.

Here, the “employee_id”, “manager_id” pair is sufficient to exit on cycle.

Thank you for following on this challenging topic, we hope it reveals more options for some of your recursive querying challenges!

Platform

Self-Managed

Services

Capabilities

Why Couchbase?

Vector Search

By Use Case

By Industry

By Application Need

Popular Docs

By Developer Role

Capella Playground

Start A Free Capella Trial

Resource Center

Education

Vector Search

What's Vector Search and why is it important?

About

Partnerships

Our Services

Partners: Register a Deal

Ready to register a deal with Couchbase?

Marriott

All Posts

Recursive Query Processing in SQL++ (N1QL)

What are hierarchical lookups?

What this blog covers

Solution to recursive queries

How to use inner queries

Points to ponder

Code for recursive CTE using JS UDFs

How to add the function

Sample Queries

Arguments (anchorArgs/recursiveArgs)

References

Author

Posted by Gaurav Jayaraj - Intern, Couchbase R&D

Leave a reply Cancel reply