Create Spark dataset using N1QL Query in Java

neeleshkumar_mannur · January 3, 2017, 2:47pm

Can we create a Spark dataset using a N1QL Query ?

neeleshkumar_mannur · January 3, 2017, 3:18pm

Its rather a dirty way to do in Java. I actually had to go through the Connector code to apply this hack. Also, a secondary index should exist on the key against which the query is to be made.

We specify options while loading the data from Couchbase into Spark as follows:

couchbaseReader(sparkSession.sqlContext().read()).couchbase(options);

where options is a Java Map. Add a key namely schemaFilter and then the conditions for WHERE clause as follows:

Map<String, String> options = new TreeMap<String, String>(); options.put("bucket", "basedata"); options.put("schemaFilter", "table_name = 'Employee' AND (Employee_ID in [1000004,1000030])");

The above options evaluate to the following query getting fired by the connector internally

SELECT META(basedata).id as META_ID, basedata.* FROM basedata WHERE table_name = ‘Employee’ AND (Employee_ID in [1000004,1000030]) LIMIT 1000

Column filters can also be applied in a similar way.

There seems very little help for Java API.

geraldss · January 3, 2017, 4:27pm

@keshav_m, more Spark questions.

daschl · January 3, 2017, 5:35pm

@neeleshkumar_mannur the docs are quite extensive on this https://developer.couchbase.com/documentation/server/4.5/connectors/spark-2.0/spark-sql.html

for the java API: https://developer.couchbase.com/documentation/server/4.5/connectors/spark-2.0/java-api.html

neeleshkumar_mannur · January 4, 2017, 5:12am

@daschl Indeed the docs are quite extensive for Scala.

The Java API does not have a clear picture of what should be done. In fact specifying the name of the bucket using Java was also clarified in a blog post and not in the docs.

daschl · January 5, 2017, 1:34pm

@neeleshkumar_mannur can you please let me know which parts exactly you are missing from the java-based docs? then I’ll add them

Topic		Replies	Views
Get N1QL for spark dataset queries/joins Spark Connector spark	2	1253	January 5, 2021
How to bulk read the data from couchbase in spark? Java SDK spark , n1ql	1	2151	December 10, 2018
Couchbase Spark Connector Java Streaming Spark Connector	11	3983	July 11, 2016
Spark Connector: No results returned when invoking sqlContext.read.couchbase() Spark Connector	6	2987	April 26, 2016
Spark Python Connection Spark Connector spark , connections , n1ql	2	3064	October 18, 2016

Create Spark dataset using N1QL Query in Java

Related topics