Spark Connector: 2.1.0
Apache Spark: 2.2.0
I’m finding strange behavior. Counting documents in data frame returns wrong values after persisting the dataframe. The same problem after caching the data frame. Value 15563 is correct. Any solutions to this strange behavior? See code below:
scala> val docsf = sql.read.format(“com.couchbase.spark.sql.DefaultSource”).option(“schemaFilter”, “type in [‘test1’] AND status in [‘test2’]”).load()
docsf: org.apache.spark.sql.DataFrame = [META_ID: string, department_id: string … 20 more fields]
scala> docsf.count
res7: Long = 15563
scala> val qqq = docsf.persist(StorageLevel.DISK_ONLY_2)
qqq: docsf.type = [META_ID: string, department_id: string … 20 more fields]
scala> qqq.count
res8: Long = 10214