We are using the latest couchbase JAVA SDK. We have documents that are of 500KB and are really complex in structure. We have to remove some top level keys from the document as they are part of metadata to recognize the document. Converting them to JsonObject and then removing them is taking a lot of CPU and hence other requests are getting delayed. So e.g., if I have document (key: abc) like this:
What I am looking for is to get the json string in response without the key_tx and version so that I can remove the processing at my end and send the json string as response.
It is data in one document, In other document, it can be arrVal and so on … So there is no same structure across documents. I am ok with subdoc api as well but how do we tackle values as they can be string, integer, boolean and other.
I think it would be possible to do the lookupIn, passing a custom JsonSerializer to LookupInOptions, that just passes through the raw bytes (which will be the contents of the “data” or “arrVal” key, or whatever has been fetched) without deserializing. Since it sounds like you want to a) skip deserialization b) not have particular keys and c) get the JSON with minimal processing?
An alternative approach: you could continue with full document fetches, but if the bottleneck is JSON deserialization, try a custom JsonSerializer that uses a different JSON library than the default one we use (Jackson). I wouldn’t be too confident of success, since Jackson is pretty fast - but may be worth benchmarking some alternatives.
It’s limited to removing top-level fields; hopefully that’s good enough for your use case.
If the fields to remove are at the start of the JSON, this technique is very fast because it does not require parsing the whole document, or even copying the document byte array.
Example usage:
import com.couchbase.client.java.Bucket;
import com.couchbase.client.java.Cluster;
import com.couchbase.client.java.Collection;
import com.couchbase.client.java.util.JsonFieldEraser;
import java.time.Duration;
import java.util.LinkedHashMap;
import java.util.List;
import java.util.Map;
import java.util.Set;
import static java.nio.charset.StandardCharsets.UTF_8;
public class JsonFieldEraserSandbox {
public static void main(String... args) {
Cluster cluster = Cluster.connect("127.0.0.1", "Administrator", "password");
Bucket bucket = cluster.bucket("default");
bucket.waitUntilReady(Duration.ofSeconds(10));
Collection c = bucket.defaultCollection();
Map<String, Object> document = new LinkedHashMap<>();
document.put("key_tx", "abc");
document.put("version", "1");
document.put("data", List.of(Map.of("key", true), Map.of("a", List.of(89, 9829, 898))));
c.upsert("foo", document);
byte[] bytes = c.get("foo").contentAsBytes();
String before = new String(bytes, UTF_8);
JsonFieldEraser.erase(bytes, Set.of("key_tx", "version"));
String after = new String(bytes, UTF_8);
System.out.println(before);
System.out.println(after);
}
}
If the number of data fields is <= 16, and you don’t mind listing them all, you could use the “get with projections” command, which uses the subdocument API behind the scenes:
@david.nault This solution (string manipulation) was able to give me performance as par with JsonIter that we have used to convert documents to json format and then removing the keys.
@david.nault Looks like there is some issue with the get with projection. I had put around 8 fields in the projection and the number of requests that I was able to do was just half of what I was getting with get call+string manipulation.