Update entry values from upper case to lowercase

@flaviu you could easily use the Eventing Service (an EE feature) to do what you want transform your data in place.

This transformation technique has a great deal of performance as Eventing is a distributed client for our Database Change Protocol (DCP) feed.

Assume you have a source bucket called “source” with some data (millions or a billion doc is absolutely fine).

For this example I will just seed two documents and I will seed them via the N1QL Query workbench.

UPSERT INTO `source` (KEY,VALUE)
VALUES ("mydoctype:1",{
  "field1": "ABC",
  "field2": "WOW",
  "fieldN": "More",
  "id": 1,
  "other": "ABC ",
  "type": "mydoctype"
} ),
 VALUES ("mydoctype:2",{
  "field1": "ABC",
  "field2": "WOW",
  "fieldN": "More",
  "id": 2,
  "other": "ABC",
  "type": "mydoctype"
} );

We can write and use an Eventing function (just deploy it from the feed boundary Everything):

  • The source bucket is aliased to src_bkt in read write mode.
  • Updates the source bucket thus requires 6.5+ to run
  • Will run faster with more workers the default is 3 if you have a lot of cores on your event node try 12 or 24.

Our Eventing Function:

function setTolower(parent,key) {
    if(!parent || !key) return false;
    var value = parent[key];
    if (!value || (typeof value !== 'string')) return false;
    var lower = value.toLowerCase();
    if (lower == value)  return false;
    parent[key] = lower;
    return true;
}

function OnUpdate(doc, meta) {
    if (doc.type !== "mydoctype") return;
    var updated = false;

    updated = setTolower(doc,'field1') || updated;
    updated = setTolower(doc,'field2') || updated;
    // more lowercase conversions as needed ...
    updated = setTolower(doc,'fieldN') || updated;

    if (updated) src_bkt[meta.id] = doc; 
}

The result after running is all documents are transformed (in this case 2):

KEY mydoctype:1
{
  "field1": "abc",
  "field2": "wow",
  "fieldN": "more",
  "id": 1,
  "other": "ABC ",
  "type": "mydoctype"
}

and

KEY mydoctype:2
{
  "field1": "abc",
  "field2": "wow",
  "fieldN": "more",
  "id": 2,
  "other": "ABC",
  "type": "mydoctype"
}

Okay so I tested the above against 100M docs on a smallish non-MDS 2GHz sever (with 12 workers) and I updated all 100M documents in 48 minutes. Obviously if you had a real production system with say 4 KV nodes and 2 eventing nodes this would be much faster.

For more details on Eventing refer to Run a Function on Data Change | Couchbase Docs