Recursive Common Table Expressions (CTEs) and Oracle’s CONNECT BY are well-known SQL constructs among RDBMS users, facilitating the delegation of complex, interdependent data structure exploration to the database layer for enhanced processing efficiency.

These constructs are crucial for querying interdependent data structures, a common requirement across various industries, including finance, supply chain management, customer relationship management (CRM), travel booking, and more recently, social networks. Recognizing their importance, all major relational database management systems (RDBMS) such as PostgreSQL, MySQL (from version 8.0), SQL Server, Oracle, and SQLite offer support for Recursive CTEs.

In contrast, NoSQL databases, which are designed to manage a diverse array of data models like documents, key-value, wide-column, and graph data, prioritize scalability, high availability, flexibility, and performance in distributed systems. In these environments, the CTE concept, recursive or not, isn’t directly addressed. Users often turn to specialized solutions such as graph databases—Cypher for Neo4J and AQL for ArangoDB, for instance—to handle complex data structures.

Couchbase sets itself apart with SQL for JSON, offering a unique approach to Recursive CTEs that also extend its multi model support. This blog will delve into three primary topics:

    1. How you can leverage a single DBMS as Couchbase for complex data structures for a number of use cases, in the same way as RDBMS could, but without the need for a more dedicated database. 
    2. The use of Couchase SQL++ construct to query, transform and project these complex relationship using a SQL construct that are familiar to RDBMS users.
    3. Best practices to manage resource consumption with Recursive CTE.

Bill of Materials use case

The BOM is a critical component in manufacturing and engineering, detailing the raw materials, parts, and components needed to manufacture a product. It often has a hierarchical structure, where parts are made up of other parts or materials.

This example will include basic components and sub-components of a desktop computer.

Component ID ComponentName ParentComponentID Quantity
1 Desktop Computer null 1
2 Motherboard 1 1
3 CPU 2 1
3 CPU fan 3 1
4 GPU 2 1
5 RAM 2 4
6 M.2 drive 2 1
7 SSD 2 1
8 Power Supply 1 1
9 Case 1 1
10 Case cooling fans 1 4

Let’s say we want to list all parts and sub-parts needed to build a desktop computer, along with the quantities required for each.

This query initializes the recursion with the top-level item (the Desktop Computer) and recursively joins the components table with itself to traverse down the hierarchy, adjusting quantities as needed for each component and sub-component.

The result will list all parts needed for the Desktop Computer, including the quantity of each and their hierarchical level, which can be useful for understanding the assembly structure or for inventory and ordering purposes.

Explanation

The CTE starts with the Bicycle (where ParentPartID is NULL) and then recursively finds all components and sub-components.

The Quantity is adjusted at each level to reflect the total number required for one Bicycle.

The Level column, though not strictly necessary for all BOM analyses, helps understand the depth of each part within the hierarchy.

This approach allows for a detailed breakdown of all materials and components required for manufacturing a product, essential for inventory management, cost estimation, and production planning in manufacturing operations.

Social Networking use case

A common application in a social networking use case is to find the degrees of connection between two users—essentially, how users are connected through a chain of mutual friends. Let’s say we need to determine the shortest path (in terms of degrees of connection) between two users in a social network. This can help in features like suggesting friendships or understanding network dynamics.

Consider we have the following users and how they are friends with each other, eg. Alice is friends with Bob, and also with Charlie. But Alice is not friends with Dana.




In this expanded network, users are connected in various ways, creating multiple paths through which users can be connected.

Let’s find the degrees of connection between Alice[1] and Frank[6], including the names of users along the path.

Explanation

    • Base Case: The CTE starts by identifying direct connections of Alice (UserID 1), including user names for readability.
    • Recursive Step: It then recursively finds friends of those connections, extending the search and tracking the degree of connection. The JOINs ensure that user names are included for both the starting user and their friends at each step.
    • Termination and Filtering: The recursion continues until it finds connections to Frank (UserID 6). The query filters for paths leading to Frank and orders the results by the degree of connection to identify the shortest paths.

This query demonstrates how to trace and enumerate paths through a social network, including user names for clarity. It provides a foundation for more complex analyses, such as identifying all mutual connections or exploring network structures.

Graph network traversal use case

For this use case, I will use a summarized version of the route data from America Airlines. Note that this is not from the Couchbase travel-sample. In this example we use Couchbase SQL++ Recursive CTE query to find all flights from LAX to MAD with < 2 stops from this sample dataset. Note that this sample data is not based on the travel-sample, but  a simplified version of the AA routes for 2008. 

source_airport_code destination_airport_code airline
LAX MAD AA
LAX LHR AA
LHR MAD AA
LAX OPO AA
OPO MAD AA
MAD OPO AA
SQL++ Query Results
/* List all routes from LAX to MAD with < 2 stops */
WITH RECURSIVE RouteCTE AS (
  SELECT [r.source_airport_code,   

          r.destination_airport_code] AS route,
        r.destination_airport_code AS lastStop,
        1 AS depth
  FROM routes  r
  WHERE r.source_airport_code = ‘LAX’
UNION ALL
  SELECT ARRAY_APPEND(r.route,f.destination_airport_code) as route,
        f.destination_airport_code AS lastStop,
        r.depth + 1 AS depth
  FROM RouteCTE  r
  JOIN routes  f ON r.lastStop = f.source_airport_code
  WHERE f.destination_airport_code != ‘LAX’ AND r.depth < 3
)

OPTIONS {“levels”:3}
SELECT r.*
FROM RouteCTE AS r
WHERE r.lastStop = ‘MAD’
AND r.depth < 3;

[
  {
    “route”: [
      “LAX”,
    “MAD”
              ]
  },
  {
    “route”: [
    “LAX”,
      “LHR”,
      “MAD”
            ]
  },
  {
    “route”: [
    “LAX”,
      “OPO”,
      “MAD”
            ]
  }
]

Explanation

    • The routeCTE  starts with all flights departing from LAX.
    • The recursive part of the routeCTE looks for flights connecting from the lastStop in the current route back to other airports, avoiding routes that loop back to LAX.
    • The route array accumulates the sequence of airport codes to show the path taken.
    • The query outputs all routes that end in MAD, detailing the paths found.

Best Practices Recursive CTE

When using Recursive Common Table Expressions (CTEs) in SQL++, developers should be aware of the implications of the recursive nature and the cost for query processing. Here are the best practices:

Set limits for the recursion depth – Always set a limit on the recursion depth to prevent infinite loops and excessive resource consumption. Use a counter or a condition within the recursive CTE to control how deep the recursion can go, include the options limits.

Monitor Performance –  Recursive CTEs can be resource-intensive. Monitor performance closely, especially for long or complex queries, and optimize them as necessary. This might involve indexing,or breaking down overly complex CTEs.

Avoid Unnecessary Complexity – Keep the logic within the recursive part of the CTE as simple as possible. Overly complex conditions or computations can significantly degrade performance. Check the JOIN condition for correctness.

Ensure Correct Data Structure –  Verify that your data is structured correctly for recursion. Incorrect or malformed data can lead to incorrect results or inefficient queries.

Test Extensively –  Thoroughly test recursive CTEs with various datasets, including edge cases. This helps in catching any issues with infinite loops, incorrect results, or performance bottlenecks.

Set the memory quota – Set the memory quota either at the request or node level to avoid excessive of memory usage in the recursive query.

Limitations

Recursive CTEs  are a powerful feature in Couchbase SQL++, which are commonly found in other RDBMS. They allow for the execution of complex queries, such as traversing hierarchical and graph network data or performing iterative calculations that are difficult to express with standard SQL. However, there are limitations and considerations to be aware of when using recursive CTEs. These limitations often pertain to performance, syntax restrictions, and the complexity of queries. Here are some specifics:

Aggregates: Recursive CTEs typically do not allow aggregate functions (MIN(), MAX(), SUM(), AVG(), etc.) or DISTINCT within the recursive part of the CTE. These operations do not make sense in the context of adding rows recursively because they imply a final result set after all recursion has been resolved.

Window Functions: Like aggregate functions, window functions (ROW_NUMBER(), RANK(), etc.) are generally not used within the recursive part of the CTE. They are intended for use on a set of rows returned by the query, making them suitable for the non-recursive term or in a query selecting from the recursive CTE.

LIMIT / ORDER BY: These clauses are not allowed within the recursive member of the CTE. The reasoning is that they pertain to the final result set ordering and does not make sense within the context of building the recursive set, where intermediate results are cumulatively constructed through each iteration.

Next Steps

Author

Posted by Binh Le

Binh Le is a Principal Product Manager for Couchbase Query service. Prior to Couchbase, he worked at Oracle and led the product management team for Sales Clould Analytics and CRM OnDemand. Binh holds a Bachelor's Degree in Computer Science from the University of Brighton, UK.

Leave a reply