How to bulk read the data from couchbase in spark?

stackoverflow1112 · December 3, 2018, 12:54pm

i have a couchbase bucket having 4 million records and i have written spark program to read the bucket data using n1ql query in intellij

val querydset = N1qlQuery.simple(“select * fromdset”)
val data_dset = spark.sparkContext.couchbaseQuery((querydset).map(_.value.toString())
val rd_dset = spark.read.json(data_dset.toDS())

but i get the following error

ERROR connection.QueryAccessor: Couchbase N1QL Query List(SimpleN1qlQuery{statement=select * from dset}) failed with {“msg”:“Error performing bulk get operation - cause: {7 errors, starting with read tcp 127.0.0.1:55885->127.0.0.1:11210: i/o timeout}”,“code”:12008}

if i use this query N1qlQuery.simple(“select * fromdset limit 250000”) . it works fine
but after that limit it throws the above error .

ingenthr · December 10, 2018, 2:39pm

If your actual use case is getting everything from a bucket, look at the streaming interface. That’s a lot more efficient than a “select *” kind of query. See the spark samples on github.com/couchbaselabs.

Topic		Replies	Views
Simple N1QL row iteration causes Timeout Exception SQL++ n1ql	2	1752	April 4, 2017
Bulkget for 8000+ records with document id Java SDK	1	1692	January 27, 2017
Couchbase N1Ql query insert array using spark not work Spark Connector spark , n1ql	4	1984	April 13, 2018
Couchbase querying performance issue Java SDK java , n1ql	5	1629	July 27, 2017
Spark connector couchbase using java API, N1QL LIMIT option Spark Connector	8	565	October 15, 2023

How to bulk read the data from couchbase in spark?

Related topics