Getting the whole document with some top level keys removed using get call

himanshu.mps · July 14, 2023, 3:56pm

Hi,

We are using the latest couchbase JAVA SDK. We have documents that are of 500KB and are really complex in structure. We have to remove some top level keys from the document as they are part of metadata to recognize the document. Converting them to JsonObject and then removing them is taking a lot of CPU and hence other requests are getting delayed. So e.g., if I have document (key: abc) like this:

{
   "key_tx": "abc",
   "version": "1",
   "data":[ {"key":true},{"a":[89,9829,898]}],
   ....
}

What I am looking for is to get the json string in response without the key_tx and version so that I can remove the processing at my end and send the json string as response.

Thanks,
Himanshu

graham.pople · July 14, 2023, 4:12pm

Hi @himanshu.mps . Is a Sub-Document fetch an option? E.g. just fetch the “data” key?

himanshu.mps · July 14, 2023, 4:20pm

@graham.pople

It is data in one document, In other document, it can be arrVal and so on … So there is no same structure across documents. I am ok with subdoc api as well but how do we tackle values as they can be string, integer, boolean and other.

graham.pople · July 14, 2023, 5:00pm

I think it would be possible to do the lookupIn, passing a custom JsonSerializer to LookupInOptions, that just passes through the raw bytes (which will be the contents of the “data” or “arrVal” key, or whatever has been fetched) without deserializing. Since it sounds like you want to a) skip deserialization b) not have particular keys and c) get the JSON with minimal processing?

An alternative approach: you could continue with full document fetches, but if the bottleneck is JSON deserialization, try a custom JsonSerializer that uses a different JSON library than the default one we use (Jackson). I wouldn’t be too confident of success, since Jackson is pretty fast - but may be worth benchmarking some alternatives.

david.nault · July 14, 2023, 11:46pm

Hi Himanshu. You can use a Jackson streaming parser to efficiently remove certain fields. Here’s some code that uses this technique: Quick way to remove top-level fields from a byte array containing a JSON object. · GitHub

It’s limited to removing top-level fields; hopefully that’s good enough for your use case.

If the fields to remove are at the start of the JSON, this technique is very fast because it does not require parsing the whole document, or even copying the document byte array.

Example usage:

import com.couchbase.client.java.Bucket;
import com.couchbase.client.java.Cluster;
import com.couchbase.client.java.Collection;
import com.couchbase.client.java.util.JsonFieldEraser;

import java.time.Duration;
import java.util.LinkedHashMap;
import java.util.List;
import java.util.Map;
import java.util.Set;

import static java.nio.charset.StandardCharsets.UTF_8;

public class JsonFieldEraserSandbox {

  public static void main(String... args) {
    Cluster cluster = Cluster.connect("127.0.0.1", "Administrator", "password");
    Bucket bucket = cluster.bucket("default");
    bucket.waitUntilReady(Duration.ofSeconds(10));
    Collection c = bucket.defaultCollection();

    Map<String, Object> document = new LinkedHashMap<>();
    document.put("key_tx", "abc");
    document.put("version", "1");
    document.put("data", List.of(Map.of("key", true), Map.of("a", List.of(89, 9829, 898))));

    c.upsert("foo", document);

    byte[] bytes = c.get("foo").contentAsBytes();

    String before = new String(bytes, UTF_8);

    JsonFieldEraser.erase(bytes, Set.of("key_tx", "version"));
    String after = new String(bytes, UTF_8);

    System.out.println(before);
    System.out.println(after);
  }
}

david.nault · July 15, 2023, 12:37am

If the number of data fields is <= 16, and you don’t mind listing them all, you could use the “get with projections” command, which uses the subdocument API behind the scenes:

public static void main(String... args) {
  Cluster cluster = Cluster.connect("127.0.0.1", "Administrator", "password");
  Bucket bucket = cluster.bucket("default");
  bucket.waitUntilReady(Duration.ofSeconds(10));
  Collection c = bucket.defaultCollection();

  Map<String, Object> document = new LinkedHashMap<>();
  document.put("key_tx", "abc");
  document.put("version", "1");
  document.put("data", listOf(mapOf("key", true), mapOf("a", listOf(89, 9829, 898))));

  c.upsert("foo", document);

  byte[] bytes = c.get("foo", GetOptions.getOptions().project("data")).contentAsBytes();
  String content = new String(bytes, UTF_8);
  System.out.println(content);
}

I haven’t measured the performance of “get with projections” vs. JsonFieldEraser.

Thanks,
David

himanshu.mps · July 15, 2023, 9:01am

@david.nault This solution (string manipulation) was able to give me performance as par with JsonIter that we have used to convert documents to json format and then removing the keys.

himanshu.mps · July 15, 2023, 9:03am

@david.nault Looks like there is some issue with the get with projection. I had put around 8 fields in the projection and the number of requests that I was able to do was just half of what I was getting with get call+string manipulation.

system · October 13, 2023, 9:03am

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Bucket.get to fetch only desired elements Java SDK	5	4030	November 30, 2016
Sub document API Mutuate In Json Object Conversion Error Java SDK java	5	2057	March 4, 2021
Use Sub document API LookupIN to map multile fields to pojo using reflection Java SDK java	2	902	March 3, 2021
Stripping outer json structure off of response SQL++	1	1498	September 4, 2016
Bulkget for 8000+ records with document id Java SDK	1	1713	January 27, 2017

Getting the whole document with some top level keys removed using get call

Related topics