Moving Data from couchbase to hadoop

I have a couchbase cluster running in the amazon cloud which have about 300 million documents(~ 150 Gb).
I want to migrate the entire data to hadoop cluster,i.e, I want to move the data from the couchbase to hadoop cluster not just copy.

I tried using the sqoop but it doesn't seams to copy the data and also all the data doesn't seams to copied.
Also it was giving me NPE when i used it on password protected bucket.

1 Answer

« Back to question.

Hello,

Sqoop will copy the data and not move them this is its goals. I am susprised about the NPE.
Note that when I am using Sqoop I am using a Cloudera distribution are you?

Maybe you can also test with Talend ETL that will allow you to copy/move the data (you have more option the the job you are creating)
- http://www.couchbase.com/couchbase-server/connectors/talend
- http://www.talend.com/resource/big-data-and-hadoop.html

(I have only used Talend with Couchbase+RDBMS but as you can see you have all the connectors needed for Hadoop too)

Regards
Tug
@tgrall

Hi Tug,

Thanks for the reply.

I am wondering if sqoop will migrate the entire data some part of it. As it was not transferring the entire data for me each time i tried.

I am also using cloudera distribution for installing hadoop and sqoop.
I used the following command for a password bucket
sqoop-import --connect http://10.xxx.xxx.xx:8091/pools --table BACKFILL_0022 --username TaTest2 --password password

and got following exception

java.io.IOException: Server returned HTTP response code: 401 for URL: http://10.xxx.xxx.xx:8091/pools
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1436)
at com.couchbase.client.vbucket.ConfigurationProviderHTTP.readToString(ConfigurationProviderHTTP.java:344)
at com.couchbase.client.vbucket.ConfigurationProviderHTTP.readPools(ConfigurationProviderHTTP.java:145)
at com.couchbase.client.vbucket.ConfigurationProviderHTTP.getBucketConfiguration(ConfigurationProviderHTTP.java:127)
at com.couchbase.client.CouchbaseConnectionFactory.getVBucketConfig(CouchbaseConnectionFactory.java:188)
at com.couchbase.client.CouchbaseClient.(CouchbaseClient.java:156)
at com.couchbase.client.CouchbaseClient.(CouchbaseClient.java:125)
at com.couchbase.client.CouchbaseClient.(CouchbaseClient.java:77)
at com.couchbase.sqoop.mapreduce.db.CouchbaseInputFormat.getSplits(CouchbaseInputFormat.java:98)
at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:1079)
at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:1096)
at org.apache.hadoop.mapred.JobClient.access$600(JobClient.java:177)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:995)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:948)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:948)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:566)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:596)
at org.apache.sqoop.mapreduce.ImportJobBase.doSubmitJob(ImportJobBase.java:186)
at org.apache.sqoop.mapreduce.ImportJobBase.runJob(ImportJobBase.java:159)
at org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:247)
at com.couchbase.sqoop.manager.CouchbaseManager.importTable(CouchbaseManager.java:145)
at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:413)
at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:502)
at org.apache.sqoop.Sqoop.run(Sqoop.java:147)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:222)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:231)
at org.apache.sqoop.Sqoop.main(Sqoop.java:240)
13/12/03 14:00:06 INFO mapred.JobClient: Cleaning up the staging area hdfs://ip-10-xxx-xxx-xx.ap-southeast-1.compute.internal:8020/user/hdfs/.staging/job_201311301454_0024
13/12/03 14:00:06 ERROR sqoop.Sqoop: Got exception running Sqoop: java.lang.NullPointerException
java.lang.NullPointerException
at com.couchbase.client.CouchbaseConnectionFactory.getVBucketConfig(CouchbaseConnectionFactory.java:188)
at com.couchbase.client.CouchbaseClient.(CouchbaseClient.java:156)
at com.couchbase.client.CouchbaseClient.(CouchbaseClient.java:125)
at com.couchbase.client.CouchbaseClient.(CouchbaseClient.java:77)
at com.couchbase.sqoop.mapreduce.db.CouchbaseInputFormat.getSplits(CouchbaseInputFormat.java:98)
at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:1079)
at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:1096)
at org.apache.hadoop.mapred.JobClient.access$600(JobClient.java:177)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:995)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:948)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:948)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:566)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:596)
at org.apache.sqoop.mapreduce.ImportJobBase.doSubmitJob(ImportJobBase.java:186)
at org.apache.sqoop.mapreduce.ImportJobBase.runJob(ImportJobBase.java:159)
at org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:247)
at com.couchbase.sqoop.manager.CouchbaseManager.importTable(CouchbaseManager.java:145)
at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:413)
at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:502)
at org.apache.sqoop.Sqoop.run(Sqoop.java:147)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:222)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:231)
at org.apache.sqoop.Sqoop.main(Sqoop.java:240)

One more info.
Sqoop if able to dump from bucket without password.