Hi, @daschl, @graham.pople,
Could you please help me on this? I’m trying to use the Spark Connector (spark-connector_2.11
, version 2.1.0) to get a full dump of all my documents from a bucket and save them to an HDFS location. I was using sqoop to perform this action, however, that stopped working after we updated couchbase server from 4.0.1 to 5.0.
My spark code is very simple:
import org.apache.spark.sql.SparkSession
import com.couchbase.client.java.query.N1qlQuery
import com.couchbase.spark._
object SparkCouchbaseConnectorTest {
def main(args: Array[String]): Unit = {
val spark = SparkSession
.builder()
.appName("Spark Couchbase Connector Test")
.config("spark.couchbase.nodes", "my.server.name.com")
.config("spark.couchbase.bucket.metrics-metadata", "")
.getOrCreate()
val sc = spark.sparkContext
val query = "SELECT * FROM `metrics-metadata`"
sc
.couchbaseQuery(N1qlQuery.simple(query))
.map(_.value.toString)
.saveAsTextFile("viewfs://cluster5/nameservices/beaconstore/allkeys")
}
}
This spark application runs without any error message, and completes successfully, however, nothing was read from couchbase and written to HDFS, except a 0-length part file. Here’s the relavent part of the log:
18/05/27 11:25:11 INFO spark.SparkContext: Starting job: saveAsTextFile at SparkCouchbaseConnectorTest.scala:30
18/05/27 11:25:11 INFO scheduler.DAGScheduler: Got job 0 (saveAsTextFile at SparkCouchbaseConnectorTest.scala:30) with 1 output partitions
18/05/27 11:25:11 INFO scheduler.DAGScheduler: Final stage: ResultStage 0 (saveAsTextFile at SparkCouchbaseConnectorTest.scala:30)
18/05/27 11:25:11 INFO scheduler.DAGScheduler: Parents of final stage: List()
18/05/27 11:25:11 INFO scheduler.DAGScheduler: Missing parents: List()
18/05/27 11:25:11 INFO storage.BlockManagerMasterEndpoint: Registering block manager lashadoop-17c17.server.hulu.com:60366 with 4.5 GB RAM, BlockManagerId(19, lashadoop-17c17.server.hulu.com, 60366, None)
18/05/27 11:25:11 INFO scheduler.DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[2] at saveAsTextFile at SparkCouchbaseConnectorTest.scala:30), which has no missing parents
18/05/27 11:25:11 INFO storage.BlockManagerMasterEndpoint: Registering block manager lashadoop-17j15.server.hulu.com:35903 with 4.5 GB RAM, BlockManagerId(41, lashadoop-17j15.server.hulu.com, 35903, None)
18/05/27 11:25:11 INFO memory.MemoryStore: Block broadcast_0 stored as values in memory (estimated size 93.1 KB, free 3.3 GB)
18/05/27 11:25:11 INFO storage.BlockManagerMasterEndpoint: Registering block manager lashadoop-16h36.server.hulu.com:37738 with 4.5 GB RAM, BlockManagerId(47, lashadoop-16h36.server.hulu.com, 37738, None)
18/05/27 11:25:12 INFO memory.MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 32.6 KB, free 3.3 GB)
18/05/27 11:25:12 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on 10.88.165.23:54014 (size: 32.6 KB, free: 3.3 GB)
18/05/27 11:25:12 INFO spark.SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:1037
18/05/27 11:25:12 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from ResultStage 0 (MapPartitionsRDD[2] at saveAsTextFile at SparkCouchbaseConnectorTest.scala:30)
18/05/27 11:25:12 INFO cluster.YarnClusterScheduler: Adding task set 0.0 with 1 tasks
18/05/27 11:25:12 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, lashadoop-17d21.server.hulu.com, executor 9, partition 0, PROCESS_LOCAL, 5765 bytes)
18/05/27 11:25:13 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Registered executor NettyRpcEndpointRef(null) (10.88.179.11:47201) with ID 6
18/05/27 11:25:13 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Tag executor 6 as REGISTERED
18/05/27 11:25:13 INFO storage.BlockManagerMasterEndpoint: Registering block manager lashadoop-17g21.server.hulu.com:37516 with 4.5 GB RAM, BlockManagerId(6, lashadoop-17g21.server.hulu.com, 37516, None)
18/05/27 11:25:13 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Registered executor NettyRpcEndpointRef(null) (10.88.165.20:41899) with ID 17
18/05/27 11:25:13 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Tag executor 17 as REGISTERED
18/05/27 11:25:13 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Registered executor NettyRpcEndpointRef(null) (10.88.161.20:35778) with ID 11
18/05/27 11:25:13 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Tag executor 11 as REGISTERED
18/05/27 11:25:14 INFO storage.BlockManagerMasterEndpoint: Registering block manager lashadoop-17a12.server.hulu.com:55673 with 4.5 GB RAM, BlockManagerId(17, lashadoop-17a12.server.hulu.com, 55673, None)
18/05/27 11:25:14 INFO storage.BlockManagerMasterEndpoint: Registering block manager lashadoop-17c12.server.hulu.com:53518 with 4.5 GB RAM, BlockManagerId(11, lashadoop-17c12.server.hulu.com, 53518, None)
18/05/27 11:25:14 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on lashadoop-17d21.server.hulu.com:37260 (size: 32.6 KB, free: 4.5 GB)
18/05/27 11:25:21 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Registered executor NettyRpcEndpointRef(null) (10.88.203.21:43467) with ID 14
18/05/27 11:25:21 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Tag executor 14 as REGISTERED
18/05/27 11:25:21 INFO storage.BlockManagerMasterEndpoint: Registering block manager lashadoop-17j12.server.hulu.com:46691 with 4.5 GB RAM, BlockManagerId(14, lashadoop-17j12.server.hulu.com, 46691, None)
18/05/27 11:25:23 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 11871 ms on lashadoop-17d21.server.hulu.com (executor 9) (1/1)
18/05/27 11:25:23 INFO cluster.YarnClusterScheduler: Removed TaskSet 0.0, whose tasks have all completed, from pool
18/05/27 11:25:23 INFO scheduler.DAGScheduler: ResultStage 0 (saveAsTextFile at SparkCouchbaseConnectorTest.scala:30) finished in 11.890 s
18/05/27 11:25:23 INFO scheduler.DAGScheduler: Job 0 finished: saveAsTextFile at SparkCouchbaseConnectorTest.scala:30, took 12.098759 s
18/05/27 11:25:23 INFO yarn.ApplicationMaster: Final app status: SUCCEEDED, exitCode: 0
18/05/27 11:25:23 INFO spark.SparkContext: Invoking stop() from shutdown hook
Thanks!