Can we create a Spark dataset using a N1QL Query ?
Its rather a dirty way to do in Java. I actually had to go through the Connector code to apply this hack. Also, a secondary index should exist on the key against which the query is to be made.
We specify options while loading the data from Couchbase into Spark as follows:
options is a Java Map. Add a key namely
schemaFilter and then the conditions for
WHERE clause as follows:
Map<String, String> options = new TreeMap<String, String>(); options.put("bucket", "basedata"); options.put("schemaFilter", "table_name = 'Employee' AND (Employee_ID in [1000004,1000030])");
The above options evaluate to the following query getting fired by the connector internally
basedata WHERE table_name = ‘Employee’ AND (Employee_ID in [1000004,1000030]) LIMIT 1000
Column filters can also be applied in a similar way.
There seems very little help for Java API.
@keshav_m, more Spark questions.
@neeleshkumar_mannur the docs are quite extensive on this https://developer.couchbase.com/documentation/server/4.5/connectors/spark-2.0/spark-sql.html
for the java API: https://developer.couchbase.com/documentation/server/4.5/connectors/spark-2.0/java-api.html
@daschl Indeed the docs are quite extensive for Scala.
The Java API does not have a clear picture of what should be done. In fact specifying the name of the bucket using Java was also clarified in a blog post and not in the docs.
@neeleshkumar_mannur can you please let me know which parts exactly you are missing from the java-based docs? then I’ll add them