Questions about couchbase connector for apache sqoop

vishveshmulay · May 12, 2015, 7:40pm

Hi,
I am interested in exporting couchbase data into HDFS so that we can cleanup data and import back into CB.

I have following questions about this

Is there any throttling available so we can control the load this process puts on CB ( it is a production cluster serving traffic so we don’t want to screw that up)
We have a huge CB cluster but the connector seems to be taking only one server as an input to --connect parameter. How does it work?

ingenthr · May 12, 2015, 10:04pm

For #1, there is some automatic throttling in Couchbase to prioritize frontend traffic over this integration. It may or may not be suitable for your environment, but at the moment the only tune-able for throttling is based on the number of splits you run which would limit the MapReduce processing in Hadoop.

For #2, though it takes only one server as a parameter, it’ll automatically discover the topology and then connect to all nodes of the Couchbase Server cluster.

vishveshmulay · May 12, 2015, 10:36pm

Thanks so much for your reply. Things are clearer now. Few more questions may follow

Topic		Replies	Views
Couchbase Hadoop Sqoop Connector - Count Issues? Kafka Connector	4	1970	December 2, 2016
Couchbase <=> HDFS Couchbase Server	6	2240	July 6, 2015
Couchebase connector for non CDH hadoop / spark Spark Connector	1	2706	October 17, 2013
Couchbase Hadoop Connector - Count mismatch? Couchbase Server	6	2430	June 16, 2016
Overhead of CDC on Couchbase Server Couchbase Server connections , kafka	1	1035	December 10, 2022

Questions about couchbase connector for apache sqoop

Related topics