Spark Connector Returns Different Result Than cbq Query

xuan · April 18, 2017, 3:48pm

setting --conf spark.executor.extraJavaOptions=-Dcom.couchbase.queryTimeout=360000 also solved my problem where the actual number of documents loaded was different from what we know. But this is more or less temporary solution since the number of docs can go up dramatically and then we have to reconfigure this parameter. Any other suggestions?

Also it would be nice to throw timeout error if timeout happens, otherwise we really don’t know whether we have right data or not. Any idea?

To increase the parallelism, I first load all meta id from couchbase and repartition it. Then, I load the full data from couchbase. By this, I could fully use my cluster resource. You can find the code here: spark.sparkContext.couchbaseQuery number of partitions

Topic		Replies	Views
Spark Connector: No results returned when invoking sqlContext.read.couchbase() Spark Connector	6	2904	April 26, 2016
Dataframe count different after persist Spark Connector	0	3275	August 4, 2017
Unable to Count the documents using spark structured streaming Couchbase Server spark	0	1531	April 14, 2020
Couchbase GUI showing incorrect info Couchbase Server	4	1033	November 2, 2017
Spark connector couchbase using java API, N1QL LIMIT option Spark Connector	8	565	October 15, 2023

Spark Connector Returns Different Result Than cbq Query

Related topics