Performance of Javascript UDF in query service

Hi there,

I created a Javascript function to emulate a recursive CTE. It functions well and gives the correct result. However it generates huge performance issue when I use this function in a query which returns more documents (~50k). Not sure if it is because the UDF would be called 50k times in the query and whether there is a way to optimize the performance for this use case.

The query without UDF returns in 2 minutes (~50k from 85m documents). After the UDF being added, the query consistently timed out even increasing the timeout to 20 minutes.

SELECT 
coalesce((select op.object_id from object_path(parent_object_id) op where op.object_type_cd in [1404] order by op.tree_level asc limit 1),'0') as group_id,
parent_object_id AS parent_id,
......
FROM app_object
WHERE object_type_cd = 151
    AND state_ind = 1
    AND NOT REGEXP_MATCHES(origin_ds, "^(.*\/ROUTE)$")
    AND REGEXP_MATCHES(origin_nm, "^(COM)$")

The UDF is like below

function object_path(root_object_id) {
    var level = 0;
    var rowcnt = 0;
    var objects = [];
    var parent_id = root_object_id;
    var q1 = select $root_object_id as object_id, parent_object_id as parent_id, $level as tree_level from `test`.`app`.`app_object` where object_id = $root_object_id;
    
    for (const obj of q1) {
        objects.push(obj);
        parent_id = obj.parent_id;
        rowcnt++;
    }
    
    level++;
    
    while (rowcnt > 0) {
        var q2 = select object_id, parent_object_id as parent_id, $level as tree_level from `test`.`app`.`app_object` where object_id = $parent_id;
        
        rowcnt = 0;
        level++;
        for (const obj of q2) {
            objects.push(obj);
            parent_id = obj.parent_id;
            rowcnt++;
        }
    }
    
    return objects;
}

Each row it needs to execute Java Script UDF (i.e. external to query service).
recursive CTE of Java script UDF execute multiple SQL statements recursively.
One recursive CTE is time consuming and you are using that in projection of 50K items it will take time.
If this actual code you can optimize futher
ORDER BY level inside CTE those are already order by level and added to array. why need order?
You have limit 1 why not push into UDF and produce 1? Why do u need recursive clause due to limit 1?
where is object_type_cd in UDF. If you need that you can push into UDF.
Understand you may looking write UDF generically.

ex: coalesce(object_path_1(parent_object_id, [1404]),‘0’) as group_id
Inside object_path_1 () UDF in result loop obj.object_type_cd in second argument
just return current object as result.

Try LIMIT 1 on main query with out recursive UDF, with and you will know approximate timeout multiplication factor need to set.

cc @Marco_Greco

If your regex are same for queries

See if this improves

CREATE INDEX ix20 ON app_object(object_type_cd, state_ind) WHERE REGEXP_MATCHES(origin_nm, "^(COM)$") AND REGEXP_MATCHES(origin_ds, "^(.*\/ROUTE)$") = false;

SELECT ....
FROM app_object
WHERE object_type_cd = 151
    AND state_ind = 1
    AND REGEXP_MATCHES(origin_ds, "^(.*\/ROUTE)$") = false
    AND REGEXP_MATCHES(origin_nm, "^(COM)$");

I am not expert in Java Script.Java Script code is pseudo code. See if u can work out with single UDF call

WITH results AS ( SELECT object_type_cd AS parent_object_id,
                   .....
                  FROM default
                  WHERE object_type_cd = 151
                        AND state_ind = 1
                        AND REGEXP_MATCHES(origin_ds, "^(.*\/ROUTE)$") = false
                        AND REGEXP_MATCHES(origin_nm, "^(COM)$"))
     rObj AS (rcte(ARRAY_DISTINCT(results[*].parent_object_id), [1404]))
SELECT r.*
       coalesce(rObj.[TO_STR(r.parent_object_id)], "0") AS group_id
FROM results AS r;
function rcte(objectids, stop_ids) {
         var finalObj = {}
         parent_ids = objectids
         while (length(parent_ids) > 0) {
              var q1 = SELECT ARRAY_AGG(CASE object_type_cd IN stop_ids TEHN object_id END) AS obj_ids,
                              ARRAY_AGG(CASE object_type_cd NOT IN stop_ids TEHN parent_object_id END) AS parent_ids
                       FROM `test`.`app`.`app_object` where object_id IN objectids;
              for (const obj of q1) { // non group aggregate 1 row only or  get result  without aggregtae do your self and add to  finalObj or parent_ids for next iteration
                   for (cosnt r in obj.obj_ids) {
                           finalObj[string(r)] = r
                   }
                   parent_ids = obj.parent_ids
                 
                }
         }
         return finalObj
}

OR

call one UDF from SQL and that iterates over your recursive UDF and construct object and returns to SQL

Thanks @vsr1. I’ve already created the cover index to optimize the overall query performance.

The space for changing the recursive function logic is very limited. Two properties, object_id and parent_object_id form a graph, then object_type_cd property indicates the type. It is a common logics to get the parent_object_id of a graph at certain container level.

I am considering the pre-calculate the group_id and use event service to maintain the value of group_id. However this increases the complexity of overall solution and adds additional effort of code change.

Is there anything in the plan to optimize the JS UDF calls? The same logic using recursive CTE in PostgreSQL is much faster. However taking away the recursive CTE logic, Couchbase runs much faster than PostgreSQL. I guess the major contribution to the performance difference is that PostgreSQL prepared the execution plan of recursive statement and had less context switches.

Not sure if Couchbase has a plan to support recursive CTE.

There is plan add native recursive CTE near future (MB-11512), buil-in function short term MB-56553

1 Like

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.