@flaviu you could easily use the Eventing Service (an EE feature) to do what you want transform your data in place.
This transformation technique has a great deal of performance as Eventing is a distributed client for our Database Change Protocol (DCP) feed.
Assume you have a source bucket called “source” with some data (millions or a billion doc is absolutely fine).
For this example I will just seed two documents and I will seed them via the N1QL Query workbench.
UPSERT INTO `source` (KEY,VALUE)
VALUES ("mydoctype:1",{
"field1": "ABC",
"field2": "WOW",
"fieldN": "More",
"id": 1,
"other": "ABC ",
"type": "mydoctype"
} ),
VALUES ("mydoctype:2",{
"field1": "ABC",
"field2": "WOW",
"fieldN": "More",
"id": 2,
"other": "ABC",
"type": "mydoctype"
} );
We can write and use an Eventing function (just deploy it from the feed boundary Everything):
- The source bucket is aliased to src_bkt in read write mode.
- Updates the source bucket thus requires 6.5+ to run
- Will run faster with more workers the default is 3 if you have a lot of cores on your event node try 12 or 24.
Our Eventing Function:
function setTolower(parent,key) {
if(!parent || !key) return false;
var value = parent[key];
if (!value || (typeof value !== 'string')) return false;
var lower = value.toLowerCase();
if (lower == value) return false;
parent[key] = lower;
return true;
}
function OnUpdate(doc, meta) {
if (doc.type !== "mydoctype") return;
var updated = false;
updated = setTolower(doc,'field1') || updated;
updated = setTolower(doc,'field2') || updated;
// more lowercase conversions as needed ...
updated = setTolower(doc,'fieldN') || updated;
if (updated) src_bkt[meta.id] = doc;
}
The result after running is all documents are transformed (in this case 2):
KEY mydoctype:1
{
"field1": "abc",
"field2": "wow",
"fieldN": "more",
"id": 1,
"other": "ABC ",
"type": "mydoctype"
}
and
KEY mydoctype:2
{
"field1": "abc",
"field2": "wow",
"fieldN": "more",
"id": 2,
"other": "ABC",
"type": "mydoctype"
}
Okay so I tested the above against 100M docs on a smallish non-MDS 2GHz sever (with 12 workers) and I updated all 100M documents in 48 minutes. Obviously if you had a real production system with say 4 KV nodes and 2 eventing nodes this would be much faster.
For more details on Eventing refer to Run a Function on Data Change | Couchbase Docs