Did not see th eoption of couchbase.batch.size.max is sink configuration. Is it not supporting for multiple batch insertions?

Checking the max batch size option available for sink configurations.
My requirement is to check the througput as the data produces 50k at a time.
Checking for batch size option as it may increase throughput.

What SDK? What version?

Thanks mreiche .Iam using Kafka couchbase sink conector 4.1.12 version

So the parameter referenced here - Migrating from Version 3.x | Couchbase Docs
I’ll let @david.nault address that.

Hi Casavi.

Short answer:

The Couchbase Kafka Sink Connector does not have a config option for tuning the batch size.

Long answer:

The Kafka Connect framework sends records to the Couchbase Kafka Sink Connector in batches. The size of these batches is not controlled by the Couchbase connector; if this batch size is configurable, it’s configurable at the Connect Framework level, and out of Couchbase’s control.

Once the connector receives a batch from the framework, it splits it into smaller sub-batches so no sub-batch contains a duplicate document ID. Then for each sub-batch, it writes all the documents to Couchbase in parallel, with a concurrency limit of 256 (no more than that many write requests will be in-flight at a time).

This behavior is not officially configurable. That said, you could change the concurrency limit by setting the reactor.bufferSize.small Java System Property to a value other than 256 when launching the Kafka Connect worker process. However, this is neither documented nor supported.

Thanks,
David

1 Like

Thanks a lot David. A Very good explanation. It helped me a lot

We have use case that lot of data comes as input in the form of json and has to insert to CB in the form of KV way.
For that using this code snippet in for loop.
Collection collection = cluster.bucket(“dev”).scope(“test”).collection(collectionName);
MutationResult upsertResult = collection.upsert(documentId, data);
Could not find out from the lib to insert the day with list of data.
Please help me for this use case to insert the data with KV in bulk .

Because each request might need to go to a different node where the document’s shard resides, there is no bulk kv api in the java sdk. To perform operations concurrently, use the reactive or async api.
Mike

Hi Vasavi, check out this similar discussion:

Thanks David. It is helpful

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.