Sqoop Plugin for Hadoop
Hello all,
As i read from the sqoop documentation it can receive from couchbase data buckets but is there a documentation about how to install and make queries over the data? Since there is no table in couchbase like mssql or mysql and data is kept in serializable objects in couchbase 1.8 is there an example usage?
as i understand the data is raw data format and we can not make analysis or run queries on the data that is pulled via plugin ? it will be available in the next release 2.0? am i right?
You should be able to run analysis on the data once it is in Hadoop. Right now you can only import all of the data on your cluster. In a future release (maybe a 2.0 release of the plugin) you will be able to create a view in Couchbase and then pull in data based on the view and query parameters passed in that view.
Hello Mike,
Is there an example of how the import will work.
This is my system configuration on a redhat 5.1 enterprise edition:
Couchbase 1.8 community edition
cloudera chd3
jdk 1.7
when I even try to list the tables on a server it throws the below error.
13/01/21 09:39:08 ERROR tool.BaseSqoopTool: Got error creating database manager: java.io.IOException: No manager for connect string: http://ipaddress/pools
at com.cloudera.sqoop.ConnFactory.getManager(ConnFactory.java:121)
at com.cloudera.sqoop.tool.BaseSqoopTool.init(BaseSqoopTool.java:204)
at com.cloudera.sqoop.tool.ListTablesTool.run(ListTablesTool.java:46)
at com.cloudera.sqoop.Sqoop.run(Sqoop.java:146)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at com.cloudera.sqoop.Sqoop.runSqoop(Sqoop.java:182)
at com.cloudera.sqoop.Sqoop.runTool(Sqoop.java:221)
at com.cloudera.sqoop.Sqoop.runTool(Sqoop.java:230)
at com.cloudera.sqoop.Sqoop.main(Sqoop.java:239)
can you throw some light on this issue.
Thanks,
Krishna
Right now you can only pull in all of the data in your Couchbase cluster to Hadoop since the plugin was targeted for Couchbase 1.8. In the future we will add api's that allow you to run queries on the views you create so that you have more control over what gets pulled into Hadoop.