How do I bulk insert a question?

jopnick · January 30, 2012, 12:14am

Hello!
We have about 50M records already in Json which we are looking to stuff into Couchbase in 350k batches.
because we are transforming from logfiles => json, we are using curl to try and do the bulk operations. I’ve tried googling and following examples that seemed to work for other folk, but I’ve not been able to successfully get a bulk operation using _bulk_docs
here is an example:


curl -X POST 'http://localhost:8092/default/_bulk_docs' -H 'Content-Type: application/json' -d '{"docs":[{"name":"tom"},{"name":"bob"}]}'

to which I get back

 
{"error":"doc_validation","reason":"User data must be in the `json` field, please nest `name`"}

I see the same thing when I copy-pasta the example in the API with the port being the exception… when i try using that port, I get no response
curl -X POST ‘http://localhost:8092/default/_bulk_docs’ -H ‘Content-Type: application/json’ -d ‘{“docs” : [{"_id" : “FishStew”,“servings” : 4,“subtitle” : “Delicious with fresh bread”,“title” : “Fish Stew”},{"_id" : “LambStew”,“servings” : 6,“subtitle” : “Delicious with scone topping”,“title” : “Lamb Stew”},{“servings” : 8,“subtitle” : “Delicious with suet dumplings”,“title” : “Beef Stew”}]}’
{“error”:“doc_validation”,“reason”:"User data must be in the json field, please nest _id"}

I’m wondering what I might change in my doc array to get the bulk action going or if anyone has any tips?

Thanks in advance!
Jop

jopnick · January 31, 2013, 12:00am

Thanks for the reply anil!

I’ve not tried the php script just yet, about to give that a go.

I tried the 2nd example refered to in the blog and downloaded a json dataset from the site. When I attempt to input said zip file I get errors .It seems to connect and find my bucket, then it gets angry in Python and quits with this output:
{‘username’: ‘jopsUserName’, ‘node’: ‘192.168.231.58:8091’, ‘password’: ‘jopsPassword’, ‘bucket’: ‘default’, ‘ram_quota’: 1000} [‘test.zip’] [2013-01-31 21:35:44,412] - [rest_client] [139716557670144] - INFO - existing buckets : [u’default’, u’logging’] [2013-01-31 21:35:44,412] - [rest_client] [139716557670144] - INFO - found bucket default Traceback (most recent call last): File “/opt/couchbase/lib/python/cbdocloader”, line 237, in main() File “/opt/couchbase/lib/python/cbdocloader”, line 229, in main docloader.populate_docs() File “/opt/couchbase/lib/python/cbdocloader”, line 179, in populate_docs self.bucket = cb[self.options.bucket] File “/opt/couchbase/lib/python/couchbase/client.py”, line 161, in getitem return self.bucket(key) File “/opt/couchbase/lib/python/couchbase/client.py”, line 118, in bucket return Bucket(bucket_name, self) File “/opt/couchbase/lib/python/couchbase/client.py”, line 217, in init self.bucket_password) File “/opt/couchbase/lib/python/couchbase/couchbaseclient.py”, line 686, in init self.init_vbucket_connections() File “/opt/couchbase/lib/python/couchbase/couchbaseclient.py”, line 781, in init_vbucket_connections self.start_vbucket_connection(i) File “/opt/couchbase/lib/python/couchbase/couchbaseclient.py”, line 791, in start_vbucket_connection serverPort, self.bucket) File “/opt/couchbase/lib/python/couchbase/couchbaseclient.py”, line 1250, in direct_client .encode(‘ascii’)) File “/opt/couchbase/lib/python/couchbase/couchbaseclient.py”, line 431, in sasl_auth_plain password])) File “/opt/couchbase/lib/python/couchbase/couchbaseclient.py”, line 426, in sasl_auth_start return self._doCmd(MemcachedConstants.CMD_SASL_AUTH, mech, data) File “/opt/couchbase/lib/python/couchbase/couchbaseclient.py”, line 315, in _doCmd self._sendCmd(cmd, key, val, opaque, extraHeader, cas) File “/opt/couchbase/lib/python/couchbase/couchbaseclient.py”, line 261, in _sendCmd vbucketId=self.vbucketId) File “/opt/couchbase/lib/python/couchbase/couchbaseclient.py”, line 270, in _sendMsg self.s.send(msg + extraHeader + key + val) socket.error: [Errno 32] Broken pipe
I’ll let you know how the php script goes, althought I’d be elated to get this docloader working as well, it looks just like what we need.

Thanks
Jop

anil · January 31, 2013, 12:02am

Hello,

For bulk operations we provide two options i.e. if you choose to use SDK we have Couchbase SDK APIs ‘Performing a Bulk Set’ here is the documentation (http://www.couchbase.com/docs/couchbase-devguide-2.0/populating-cb.html) and we also provide ‘CBDocLoader’ tool for bulk loading json document here is the blog post on that (http://blog.couchbase.com/loading-json-data-couchbase) some examples here (https://github.com/couchbase/couchbase-examples)

Hope that helps…
Anil

anil · January 31, 2013, 12:03am

Sure, we definitely would like to get the CBDocLoader working for you. Can you send us the sample snippet of the json document we are interested in seeing the ‘format’ of the document.

Anil

jopnick · January 31, 2013, 12:08am

totally, one might find said file Here

I basically just borrowed one of the .json files from the trees data and put it into a directory named ‘test’, zipping it all up.

I also tried just feeding in the directory itself and the json file itself, but none of the things worked. I dug around in the beer examples and they seem to only have 1 json object per file. Is that the proper format?

anil · February 1, 2013, 12:06am

Yes that’s correct one json object per file is the proper format. If you check the blog post I mentioned (http://blog.couchbase.com/loading-json-data-couchbase) we use simple python script to split each json object into multiple files to produce one json object per file. We then loaded the data into Couchbase using the cbdocloader tool.

Hope that helps…
Anil

jopnick · February 1, 2013, 12:09am

That totally worked. Thanks a bunch for your help.

As an aside, the issue now is that we have 350k json files in a directory when zipped, creates a 174MB zip from a 101Mb log file. This is okay, and our ops/sec are now at ~300 vs. ~50 which is a fantastic improvement. We’ve found the pointing it to a directory to be working but pointing it to a .zip didn’t… which is weird… no errors, it just says ‘done’ and we have no docs. I assumed it might be the async write so

I waited a bit, but the docs never showed up. I’ll dig into that though, its probably something with the zip.

Do you know if there will be eventual support for the couchdb style _bulk_docs action in the future?

Thanks again for all your help and the 6x improvment on throughput!
Jop

Topic		Replies	Views
Loading json into buckets Couchbase Server	7	4223	November 10, 2016
Unable to bulk load JSON Couchbase Server	4	2929	July 14, 2015
Insert JSON file in PHP PHP SDK	2	2393	July 13, 2015
Loading a JSON file with 1000 documents into a new Database Couchbase Server	2	4287	February 10, 2018
Import a huge JSON File Couchbase Server	6	3351	March 5, 2016

How do I bulk insert a question?

Related topics