Slow write performance using Couchbase Spark Connector 'saveToCouchbase'

zoltan.zvara · November 17, 2017, 11:46am

We observe very slow write performance using the Couchbase Spark Connector.

The Connector 2.2.0 currently using async bucket and inserts documents one-by-one. We use the Connector with Spark Streaming, where 1000-5000 documents supposed to be inserted per second. Documents go through expensive models before insertion, but nevertheless, the writing dominates the run time of the whole micro-batch.

We have very few indexes on documents, practically one index on the document type (cardinality of 2).

Are there any tips to improve? Maybe rewrite the insertion to bulk operations?

Cheers,
Zoltán

zoltan.zvara · November 17, 2017, 12:35pm

Update:

Now having UPSERTs, which is better, but still not acceptable.
What are the suggested number-of-executors, executor-cores or number of writer partitions based on machine resource dimensions? I see that there is only one CouchbaseConnection per executor. Is that right?
CPU and cluster IO not fully utilized.

Thanks for tips,
Cheers,
Zoltán

Topic		Replies	Views
Leveraging pyspark to write to couchbase Spark Connector	3	1134	April 21, 2022
When connect to more than one bucket the reads are slowing down (number of gets/sec) Spark Connector	0	1576	January 23, 2018
Inserts are around 1k on Cluster on Dev testing Couchbase Server	1	1347	September 29, 2017
How to improve perfromance of write in couchbase DB Couchbase Server	7	2851	September 13, 2018
How to do bulk insert via couchbase for java Couchbase Server	10	6422	March 10, 2022

Slow write performance using Couchbase Spark Connector 'saveToCouchbase'

Related topics