Eventing - one time bulkl insert based on select data and adding uuid

@jon.strabala

I have this below eventing function. I am trying to bulk insert into hist and it is giving me timeout and having a doubt its adding more rows in insert because of messed for loop.
This is for one time execution only because we are doing only for existing old data.

// Eventing function to insert history with missing split refs
function OnUpdate(doc, meta) {
try {
var data =
SELECT d.id, h AS historyid FROM pure-ems-cms d UNNEST d.history AS h
WHERE d.type=“page” AND ARRAY_LENGTH(d.history) > 0
AND IFMISSING(d.isStack, FALSE) != TRUE
AND IFMISSING(d.isUnschedul , FALSE) != TRUE
AND IFMISSING(d.sup, 0 ) = 0
AND d.id NOT IN (SELECT DISTINCT RAW k.iRef FROM pure-ems-history k
WHERE k.type = “history” AND IFMISSING(K.details.type, FALSE) != TRUE
AND k.iRef IS NOT NULL);
// loop through the data resultset
for (var val of data) {
if (!val.historyid && !val.id) {
log(‘no records to insert into history’);
break;
}
var historyID = val.historyid;
var vID = val.id;

if (val.historyid = historyID) {
  try {
     // get data from history and insert a new document with key=uuid() and replace iRef from above query 
     // getting timeouts here and also have a feeling this is adding more records than intended 

     INSERT INTO `pure-ems-history` (KEY UUID(), VALUE OBJECT_PUT(docx, "iRef", $vID)) SELECT d AS docx FROM `pure-ems-history` AS d WHERE d.id = $historyID;
        log('insert successful', vID);  
   } catch(e) { log(e, vID); }    

}
} data.close();
}catch(e) { log(e);
}
}

@PM789 it appears you are relying on a primary N1QL SELECT (and a secondary SELECT for non-existence) within the Eventing function on any mutation in source and then based on some logic as you iterate through the result set you perform an INSERT into pure-ems-history Of course @vsr1 is correct you can increase your script timeout form the default of 60 (1 minute) to say 3600 (1 hour) - but you might want to improve your design.

Comment #1

Eventing responds to changes so if there are a lot of changes or mutations to the source bucket (or in 7.0 the source collection) you will invoke this function multiple times - which would not be good.

I was surprised that your function doesn’t use or rely on the mutation arguments to OnUpdate of doc and meta.

Since you say this is a point tool only intended to run once to store history I would imagine if there are other documents changing in the source bucket (or collection) you might want a filter as below:

function OnUpdate(doc, meta) {
    if (meta.id !== "move_to_hist") return;
    // original logic follows .....

Doing the above you would need to create a document with the key “move_to_hist” but that’s pretty simple to do … and you will be guaranteed only run one thread of execution for your function.

Comment #2

Let’s talk about the design itself assuming that the history, remember Eventing can stream all the documents from that exist.

Maybe you can eliminate the first SELECT (and some N1QL) by just filtering for the documents you want

function OnUpdate(doc, meta) {
    if (! (doc.type && doc.type === "page" && /* more filters */ )) return;

then you can use KV to test if you already processed the document from pure-ems-cms and made insertions into pure-ems-history (perhaps we make a marker doc with the key meta.id if we already did this archive)

  if (history_bkt[meta.id] !== null) return;

if it doesn’t exist in pure-ems-history then you can do your INSERT

Here I assume the INSERT uses a SELECT statement for multiple documents and you can not efficiently do it in pure KV (you could do this if you knew how to make your KEYS to avoid N1QL altogether)

if the INSERT succeeds then make a marker doc with the same key as meta.id via a bucket binding

  history_bkt[meta.id] = {"ts": Date.now()};

What the above does processes all your items in multiple queues each item will take much less time that the sequential BULK coding you initially had N1QL INSERTS will run in parallel and if there is nothing to do you will skip over those items supper fast. To improve the initial performance you could up the number of workers.

With a sample document and a sample of your expected archive I could gin up a complete solution or at least advise better.