Log Document Mutation history

Hi @Dario_Mazza,

Without knowing your data volumes you could make an Eventing Function like the following:

  • It will keep track of any change to DCP deplication (by this I mean if things change super fast Couchbase’s underlying data base change protocol may dedup multiple version into a single item. So if you had a tight SDK loop changing a document this will not work, but if you have a few milliseconds between changes you should be good.
  • The most recent change will have the same KEY in the arc_bkt (and Bucket binding of arc_bkt to your target bucket say “archive” in r+w mode). This easy to find as it has the same KEY
  • All changes will also have a archive_KEY = KEY + “:” + CAS where CAS is an incrementing number loosely based on seconds since epoch. If you want to find the history you may need a primary index on the “archive” bucket
  • An expiry of 12 hours is set to clean up the history you can adjust this, you can change this.
  • Metadata Purge Interval in the detailed bucket settings by default is three days you can set this lower if you want to free up space faster.

Eventing Function name: ArchiveDocHistory12Hours

// A bucket binding to an archive bucket aliased as 'arc_bkt'
// in r+w mode is required. 

function OnUpdate(doc, meta) {
    var previous_key_info = arc_bkt[meta.id]
    var key_info = extractKeyDataToArchive(doc);
    if (!previous_key_info) {
        // The archive of key information did not exist seed the first
        // deployment with feed boundary Everything fill in everything
        arc_bkt[meta.id] = key_info;
        // Also keep KEY.cas (cas is a timestamp like value) for 12 hours
        couchbase.upsert(arc_bkt,{"id": meta.id + ":" + meta.cas, expiry_date: new Date(Date.now() + 12 * 60 * 60 * 1000)},key_info)
        log(meta.id,"had no prior key_info archived");
        // new doc was added (or initial run) 
        // *** Add any needed code here ...
        return;
    }
    // Determine subset from extractKeyDataToArchive has changed
    var changed = false;
    if (crc64(previous_key_info) != crc64(key_info)) {
        changed = true;
        // since the key_info change update the archive version
        arc_bkt[meta.id] = key_info;
        // Also keep KEY.cas (cas is a timestamp like value) for 12 hours
        couchbase.upsert(arc_bkt,{"id": meta.id + ":" + meta.cas, expiry_date: new Date(Date.now() + 12 * 60 * 60 * 1000)},key_info)        
    }
    // *** Add any needed code here ...
    log(meta.id,"key_info changed:",changed,
        "current", key_info, "previous",previous_key_info);
}

function extractKeyDataToArchive (doc) {
    // *** Adjust the data you want to archive this could be a 
    // **** subset of the doc, however here we do the entire doc
    return doc;
}

This code was loosely based on a previous forum post Onupdate eventing function to get the old value - #2 by jon.strabala where extractKeyDataToArchive() was used to only archive on version of a key subset of data.

Okay say you have one document in the “source” bucket …

KEY: anydoc:124
{
  "id": 124,
  "externalId": 1111,
  "more": {
    "megabytes of uninteresting data": "just emulating ...."
  }
}

When you deploy the Function you will get two items in the “archive” bucket.

KEY: anydoc:124
{
  "id": 124,
  "externalId": 1111,
  "more": {
    "megabytes of uninteresting data": "just emulating ...."
  }
}
KEY: anydoc:124:1623073710383104000
{
  "id": 124,
  "externalId": 1111,
  "more": {
    "megabytes of uninteresting data": "just emulating ...."
  }
}

The log(…) message for the mutation from the initial deployment will be emitted to the Application log

2021-06-07T07:02:16.606-07:00 [INFO] "anydoc:124" "had no prior key_info archived"

Decoding part of the CAS suffix on the archive key anydoc:124:1623073710383104000 -or- 1623073710 (I am in Pacific Time) as a unix timestamp in seconds we see that the archived doc was mutated on

Mon Jun 07 2021 13:48:30 GMT+0000
Mon Jun 07 2021 06:48:30 GMT-0700 (Pacific Daylight Time)

Now let’s update the source doc change externalId from 1111 to 7777 change nothing else ( in the source bucket ) and of course leave the Eventing Function deployed. Now you will see three documents in the bucket archive.

KEY: anydoc:124
{
  "id": 124,
  "externalId": 1111,
  "more": {
    "megabytes of uninteresting data": "just emulating ...."
  }
}
KEY: anydoc:124:1623073710383104000
{
  "id": 124,
  "externalId": 1111,
  "more": {
    "megabytes of uninteresting data": "just emulating ...."
  }
}
KEY: anydoc:124:1623075290424279040
{
  "id": 124,
  "externalId": 7777,
  "more": {
    "megabytes of uninteresting data": "just emulating ...."
  }
}

Now let’s update the source doc change externalId from 7777 to 8888 change nothing else ( in the source bucket ) and of course leave the Eventing Function deployed. Now you will see four documents in the bucket archive.

KEY: anydoc:124
{
  "id": 124,
  "externalId": 8888,
  "more": {
    "megabytes of uninteresting data": "just emulating ...."
  }
}
KEY: anydoc:124:1623073710383104000
{
  "id": 124,
  "externalId": 1111,
  "more": {
    "megabytes of uninteresting data": "just emulating ...."
  }
}
KEY: anydoc:124:1623075290424279040
{
  "id": 124,
  "externalId": 7777,
  "more": {
    "megabytes of uninteresting data": "just emulating ...."
  }
}
KEY: anydoc:124:1623075564122079232
{
  "id": 124,
  "externalId": 8888,
  "more": {
    "megabytes of uninteresting data": "just emulating ...."
  }
}

Best

Jon Strabala
Principal Product Manager - Server‌