I am interested in exporting couchbase data into HDFS so that we can cleanup data and import back into CB.
I have following questions about this
- Is there any throttling available so we can control the load this process puts on CB ( it is a production cluster serving traffic so we don’t want to screw that up)
- We have a huge CB cluster but the connector seems to be taking only one server as an input to --connect parameter. How does it work?
For #1, there is some automatic throttling in Couchbase to prioritize frontend traffic over this integration. It may or may not be suitable for your environment, but at the moment the only tune-able for throttling is based on the number of splits you run which would limit the MapReduce processing in Hadoop.
For #2, though it takes only one server as a parameter, it’ll automatically discover the topology and then connect to all nodes of the Couchbase Server cluster.
Thanks so much for your reply. Things are clearer now. Few more questions may follow