XDCR filtering based on document age

We have a requirement to archive data that is older than X months. We plan to setup XDCR for the same and have filtering defined based on the difference between the current date and the creation date field of the document. So far we have not been able to create such a filter. We see that the XDCR documentation mentions date functions as not supported. However we are trying to look for any alternative that will allow us to do the replication based on the date logic.
We are setting XDCR for the first time and wondering of there are any such features available that can help us achieve this type of filtering. The logic would be simple based on createdOn field in the document and current date.
Any pointers or alternate options would be appreciated

Apologies for the delay in getting back to you

Unfortunately you’re not going to be able to achieve what you want with XDCR filtering. A document is replicated when it is added or changed, and the filter is evaluated at that time. Therefore, if a document’s createdDate passes the threshold but isn’t subsequently updated, it won’t be re-evaluated for replication.

I would suggest using the Eventing Service here which allows you to define a timer that will trigger at some point in the future. So in your case, when a document comes in, you would set a timer for createdDate+. You can then configure a callback to do “whatever you want” - as an example, you could have it update the document with a “replicate=true” field which can then be filtered on in XDCR.

Hi @deepaka

We have a requirement to archive data that is older than X months.

As perry said you could use Eventing and timers (2-3 docs under the hood) to perform the task you want. These examples

Will help you understand ans build a solution that will work. However if your talking 100’s of millions or a billion or more documents the use of timers might be a bit heavyweight.

For very large data sets you might want two (2) Eventing functions the first function would to listen to the “real” (primary) collection via the OnUpdate() callback and just create a “proxy” document in another (proxy) collection which is essentially just the KEY of the document (without data) you want to archive with a TTL set.

The second function would listen to the proxy collection and process via the OnDelete() callback when the “proxy” documents expire (due to their TTLs) and then perform any needed archiving function (move/delete) of the real (primary) document in the primary collection.

In later versions of Couchbase we can listen to more than one keyspace under a bucket at the same time thus a single Eventing Function can perform the action you want. This solution requires just one additional document (same key but an empty data payload). Below I give the code (pay attention to the setup requirements in the comments):

/*
This Eventing Function "archive_primary_via_proxy" is tested on 7.1.3 and requires:

1) A multi-collection Listion to Location: mydata.myscope.*
2) A R+W alias alias_primary to: mydata.myscope.primary
3) A R+W alias alias_proxy to: mydata.myscope.proxy

Data or documents is written to keyspace mydata.myscope.primary
The initial insert will generate a proxy key in keyspace mydata.myscope.proxy with a
one minute TTL or expiry on a proxy key since the KEYs are the same in both collections 
an action can be taken with the "real" data in mydata.myscope.primary prior to deleting 
the "real" document.  We do this because the OnDelete() action only has the meta data
of the document not the actual body or data of the document.
*/
function OnUpdate(doc, meta) {
    if (meta.keyspace.collection_name === "primary") {
        
        if (alias_proxy[meta.id]) {
            // already have a 'proxy' do nothing
            return;
        }
        
        // this was a mutation (insert/update/upsert) in to the keyspace mydata.myscope.primary
        // write the key out to the "proxy" collection and set the TTL 60 seconds in the future
        var nowMs = Date.now();  // get current unix time (in ms.).
        var delMs = nowMs + 60 * 1000; // delete 60 seconds into the future
        var expiryDate = new Date(delMs);
        
        var res = couchbase.insert(alias_proxy,{"id":meta.id,"expiry_date":expiryDate},{});
        if (!res.success) {
            if (res.error.code !== 2) {
                log('OnUpdate ERROR',meta.id,'Setting proxy TTL via insert to',expiryDate,'failed',res);
                return;
            }
        }
        log('OnUpdate INFO 1','action made proxy KEY',meta.id,'with TTL set to',expiryDate);
    }
}

function OnDelete(meta, options) {
    if (meta.keyspace.collection_name === "proxy") {
        
        var prim_doc = alias_primary[meta.id];
        if (!prim_doc) {
            // already processed or removed
            return;
        }
        log('OnDelete INFO 1',meta.id,'proxy key was deleteted via',options,'removing primary');
        
        // do any other archiving action here with KEY meta.id in KEYSPACE mydata.myscope.primary
        // like move to another collection, use curl, compress and aggregate, etc. etc. 
        // below I just log the entire document.
        var prim_doc = alias_primary[meta.id];
        log('OnDelete INFO 2','action just log primary KEY',meta.id,'DOC',prim_doc);
        
        // now delete the document form the primary collection
        delete alias_primary[meta.id];
    }
}

Now deploy the above function (you must have the two keyspaces mydata.myscope.primary and mydata.myscope.proxy as well as an eventing scratchpad keyspace). And test the fucntion via adding a document with the KEY test:001 and DATA {“id”: “001”,“boolvalue”: true}

Immediately you will see that a proxy key with an empty document was created.

2023-06-22T12:41:29.440-07:00 [INFO] "OnUpdate INFO 1" "action made proxy KEY" "test:001" "with TTL set to" "2023-06-22T19:42:29.436Z"

Then 60 seconds later on the expiry of the proxy KEY the original doc will be printed out (the archiving action) and then deleted.

2023-06-22T12:42:39.735-07:00 [INFO] "OnDelete INFO 1" "test:001" "proxy key was deleteted via" {"expired":true} "removing primary" 
2023-06-22T12:42:39.736-07:00 [INFO] "OnDelete INFO 2" "action just log primary KEY" "test:001" "DOC" {"id":"001","boolvalue":true}

Best

Jon Strabala
Principal Product Manager - Server‌