Did not see th eoption of couchbase.batch.size.max is sink configuration. Is it not supporting for multiple batch insertions?

Vasavi · June 5, 2023, 5:17am

Checking the max batch size option available for sink configurations.
My requirement is to check the througput as the data produces 50k at a time.
Checking for batch size option as it may increase throughput.

mreiche · June 5, 2023, 4:28pm

What SDK? What version?

Vasavi · June 5, 2023, 4:48pm

Thanks mreiche .Iam using Kafka couchbase sink conector 4.1.12 version

mreiche · June 5, 2023, 5:15pm

So the parameter referenced here - Migrating from Version 3.x | Couchbase Docs
I’ll let @david.nault address that.

david.nault · June 7, 2023, 5:00pm

Hi Casavi.

Short answer:

The Couchbase Kafka Sink Connector does not have a config option for tuning the batch size.

Long answer:

The Kafka Connect framework sends records to the Couchbase Kafka Sink Connector in batches. The size of these batches is not controlled by the Couchbase connector; if this batch size is configurable, it’s configurable at the Connect Framework level, and out of Couchbase’s control.

Once the connector receives a batch from the framework, it splits it into smaller sub-batches so no sub-batch contains a duplicate document ID. Then for each sub-batch, it writes all the documents to Couchbase in parallel, with a concurrency limit of 256 (no more than that many write requests will be in-flight at a time).

This behavior is not officially configurable. That said, you could change the concurrency limit by setting the reactor.bufferSize.small Java System Property to a value other than 256 when launching the Kafka Connect worker process. However, this is neither documented nor supported.

Thanks,
David

Vasavi · June 10, 2023, 7:37am

Thanks a lot David. A Very good explanation. It helped me a lot

Vasavi · June 10, 2023, 7:53am

We have use case that lot of data comes as input in the form of json and has to insert to CB in the form of KV way.
For that using this code snippet in for loop.
Collection collection = cluster.bucket(“dev”).scope(“test”).collection(collectionName);
MutationResult upsertResult = collection.upsert(documentId, data);
Could not find out from the lib to insert the day with list of data.
Please help me for this use case to insert the data with KV in bulk .

mreiche · June 10, 2023, 8:28pm

Because each request might need to go to a different node where the document’s shard resides, there is no bulk kv api in the java sdk. To perform operations concurrently, use the reactive or async api.
Mike

david.nault · June 12, 2023, 4:45pm

Hi Vasavi, check out this similar discussion:

Vasavi · June 12, 2023, 5:03pm

Thanks David. It is helpful

system · September 10, 2023, 5:04pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Import 6 Million Record Java SDK	3	3027	January 17, 2015
maxConcurrency parameter for couchbaseUpsert in 3.X? Spark Connector connections	0	690	September 21, 2022
Kafka-couchbase-connect sink couchbase.remove.document.id=true is not working Kafka Connector	17	2489	November 26, 2022
Is there an option to throttle the documents load to kafka topic in Couchbase Source connector? Kafka Connector	1	1185	April 21, 2020
How to do bulk insert via couchbase for java Couchbase Server	10	6221	March 10, 2022

Did not see th eoption of couchbase.batch.size.max is sink configuration. Is it not supporting for multiple batch insertions?

Related topics