Servidor Couchbase

Bulk Get Documents in Couchbase using Reactive or Asynchronous API

When working with distributed databases like Couchbase, performance and efficiency are key considerations, especially when retrieving a large amount of data. Many times when customers come from different development or database backgrounds, they ask about the capability of Couchbase to do “multi-get” or “bulk get” operations.  Many databases offer “multi-get” as an out of the box method to retrieve multiple documents to perform based on their keys. Most Couchbase SDKs don’t offer explicit APIs for batching because reactive programming provides the flexibility to implement batching tailored to your specific use case and is often more effective than a one-size-fits-all, generic method.

What is Bulk Get?

A bulk get operation allows you to request multiple documents in a single operation, rather than making repeated individual GET calls. In traditional key-value stores, each request usually targets a specific node. However, in a distributed environment like Couchbase, spreading these operations across nodes can introduce overhead if managed manually.

SDK support for bulk operations

The Couchbase SDKs (including Java, .NET, and Go) offer built-in support for bulk get operations. These SDK methods are designed to accept a list of document keys and automatically manage the parallel execution of individual OBTER requests in an efficient way because of three main reasons.

    • Parallelism: Rather than fetching each document sequentially, the SDKs initiate multiple requests concurrently.
    • Node targeting: The SDKs intelligently route each request to the correct node in the cluster where the data resides.
    • Asynchronous execution: Leveraging the asynchronous capabilities of each SDK, the operations are handled in a non-blocking fashion, ensuring higher throughput and better resource utilization.

Couchbase provides two main ways to achieve bulk get capability using Reactive Programming and Async Programming.

Reactive API

If you’re aiming to optimize bulk get operations in Couchbase, reactive programming provides an efficient and easier approach. Couchbase’s binary protocol has out-of-order execution and has strong support for async operations in KV. By efficiently managing asynchronous data flows, it enables high throughput and low latency, making it ideal for distributed systems. To fully leverage its capabilities, a fully reactive stack where each layer, from the database to the client, supports reactive streams is ideal. Couchbase’s ReactiveCollection integrates seamlessly with Project Reactor, enabling fully non-blocking access to Couchbase Key-Value (KV) operations. This integration aligns perfectly with modern reactive architectures, allowing applications to handle high-throughput workloads more efficiently by avoiding unnecessary thread blocking.

That said, migrating an entire existing application to a reactive architecture can involve significant work. If it is a new project, adopting a reactive framework like Spring WebFlux is strongly recommended. However, even in non-reactive applications, introducing a reactive approach at the Couchbase CRUD layer alone can deliver meaningful gains. By doing so, you can minimize thread blocking and reduce CPU throttling, leading to better resource efficiency and improved scalability.

Below is an example of a Java code that can maximize the performance of Couchbase using Reactive API and can work with a non-reactive stack.

This reactive approach is fetching documents using their IDs and returning a Map<String, V> where each key is a document ID and the value is the processed result. While it’s not wrong to collect the results into a List and reprocess them later, a better strategy (both in terms of performance and code clarity) is to collect the results into a ConcurrentHashMap indexed by document ID. This avoids repeated scanning and makes result lookups constant-time operations. Let’s break down how this works step-by-step.

    1. Creating a Reactive stream from document IDs
      In line 19, we are creating a Fluxo (reactive stream) from the list of document IDs. For each document ID, it calls collection.get(documentId) to fetch the document reactively.
    2. Wrapping results in SuccessOrFailure
      To ensure resilience, each async operation wraps the result in a SuccessOrFailure<GetResult> object. This wrapper captures both successful and failed fetches. By default, if collection.get(documentId) throws an error (e.g. network issue, missing doc), the whole Fluxo stream will error out and stop processing. This is not ideal for bulk operations as we want to continue processing other documents even if one fails. So instead of propagating the error, it converts the failure into a SuccessOrFailure.failure(error) object. This way, the downstream still gets a valid value (SuccessOrFailure) for every documentID, whether successful or failed.
    3. Pairing document IDs with results using Mono.zip
      Usando Mono.zip makes it explicit that you’re combining the documentId and the async obter result into a tuple. This helps identify the association between documentID and result, especially when results arrive out of order due to concurrency.
    4. Concorrência controles how many document fetches are run in parallel (how many requests are in flight at once).
    5. Parallelism and scheduler handoff
      Reactive streams are non-blocking by default, but transformation logic (e.g., parsing JSON, converting data) can be CPU-intensive. Before we collect the resulting tuples, the stream switches to a caller-specified scheduler using publishOn(…). This offloads the transformation work from IO threads to a separate thread pool. That ensures IO threads aren’t blocked by transformation work due to heavy computation.
    6. Collecting into a map
      Once all results are in, the stream collects the tuples pairs into a map. It uses mapSupplier to create the map. For each (documentId, result) pair, it applies mapValueTransformer to transform the raw result into a domain-specific type V and then puts the transformed value into the map.
    7. Blocking to retrieve final result
      Since everything here is asynchronous (non-blocking), block() is used to wait for the entire stream to finish and return the constructed mapa to the caller.

Asynchronous API

While we recommend using the reactive APIs for their performance, flexibility, and built-in backpressure handling, Couchbase also offers a low-level Asynchronous API for scenarios where you need even more fine-grained control and performance tuning. However, writing efficient asynchronous code comes with its own challenges, it requires careful management of concurrency and backpressure to prevent resource exhaustion and avoid timeouts.

Below is an example demonstrating how to use the Async API to enhance bulk get performance in Couchbase:

Let’s break down how this works step-by-step.

    1. Fetch documents
      Here we iterate over keys and for each key, we call collection.async().get(key, options), which returns a CompletableFuture<GetResult> and then we store all those futures in a list.
    2. Wait for all fetches to finish
      CompletableFuture.allOf(…) creates a new future that completes when all futures in the array complete..join() blocks the current thread until all async fetches are done.
    3. Transform results
      Once all the fetches are done, we create another list to hold final values in plain List<JsonObject>. For each CompletableFuture<GetResult>, we retrieve and transform the result. Depending on the requirement, you can handle the failure error by either adding nulo to the list in place of the failed result or an error marker object.
      The transformation step assumes that fetching the documents is complete prior to transforming results, however, If the goal is to continue chaining async operations then we can create a list if futures List<CompletableFuture<JsonObject>> and wrap the transformation in another async wrapper.

We recommend using this API only if you are either writing integration code for higher level concurrency mechanisms or you really need the last drop of performance. In all other cases, the reactive API (for richness in operators) is likely the better choice.

Conclusão

Reactive programming offers one of the most efficient ways to achieve high performance for bulk get operations with Couchbase. Its true power is unlocked when applied across an entirely reactive stack, where non-blocking behavior and scalability are fully optimized.

That said, you don’t need a fully reactive architecture to start reaping the benefits. A practical and impactful first step is to migrate just the Couchbase CRUD layer to reactive. Doing so can dramatically reduce thread blocking and minimize CPU throttling, leading to better system responsiveness and resource utilization without requiring a complete architectural overhaul.

If performance and scalability are priorities, reactive programming is well worth the investment, even in a partial implementation.


The author acknowledges the Couchbase SDK team and their excellent explanation on how we can achieve the batching efficiently without the need for a generic bulk get function, thank you.



Compartilhe este artigo
Receba atualizações do blog do Couchbase em sua caixa de entrada
Esse campo é obrigatório.

Autor

Postado por Rohit Kumar, engenheiro de soluções sênior

Deixe um comentário

Pronto para começar a usar o Couchbase Capella?

Iniciar a construção

Confira nosso portal do desenvolvedor para explorar o NoSQL, procurar recursos e começar a usar os tutoriais.

Use o Capella gratuitamente

Comece a trabalhar com o Couchbase em apenas alguns cliques. O Capella DBaaS é a maneira mais fácil e rápida de começar.

Entre em contato

Deseja saber mais sobre as ofertas do Couchbase? Deixe-nos ajudar.