How are developers importing millions of documents to Couchbase?

tito · March 5, 2016, 2:50am

So far, my experience is that cbdocloader only deals with one document at a time, being unable to parse an array of objects (even if the array contains a few, 10, 100 or 1000). This amazes me, because knowing that Couchbase deals with millions of documents with ease, its importing facility (cbdocloader) is sub-par.

My question: how are developers importing many documents to Couchbase? In CouchDB I was able to POST an array of say 1K or 10K at once, multiple times and move entire databases in a few calls. I fail to see how Couchbase sees cbdocloader as a real solution. Littering the file system with millions of documents and the amount of I/O involved in dealing with them doesn’t seem right.

I must be overlooking something important. Please tell me I’m wrong (and how to fix it

househippo · March 7, 2016, 7:30am

being unable to parse an array of objects (even if the array contains a few, 10, 100 or 1000).

Is the file that you are loading from storing the JSON like A or B?:
A.
[{“data”:1},{“data”:2}]
B.
{“data”:1}
{“data”:2}

tito · March 7, 2016, 3:42pm

Hi Fujio,

I formatted the file with a JSON array, that is, A:

[{"data":1},{"data":2}]

Thanks,

– Tito

ingenthr · March 7, 2016, 6:42pm

The main thing @tito is that cbdocloader was really designed for the sample databases rather than to be a high-performance transfer tool. The tool later introduced for high-performance transfer is cbtransfer.

Just a bit more background: the original cbdocloader was constrained by our need to generate .exe files for the Windows platform and, in that era, generating these from pure Python. Now that we’re using golang extensively in Couchbase, the team is looking to use it as a basis for updates/replacement tools. It should give us much better out-of-the-box performance along with the ability to generate executables for more platforms. There is more detail in MB-17884.

tito · March 7, 2016, 9:25pm

Hi @ingenthr, thanks for the help. I understand now, makes sense. Looking at cbtransfer’s documentation, it looks like it doesn’t support JSON. Are there any plans to enhance cbtransfer to handle JSON batch loading?

househippo · March 8, 2016, 6:00am

This is the best way to use the cbdocloader tool. Here is a bunch of JSON files.

Zip the folder into test.zip

/opt/couchbase/bin/cbdocloader -n (ip_address):8091 -u Username -p password -b bucket test.zip

Now they are in Couchbase Server below.

Topic		Replies	Views
Loading a JSON file with 1000 documents into a new Database Couchbase Server	2	4119	February 10, 2018
Import a huge JSON File Couchbase Server	6	3250	March 5, 2016
Importing json array from MongoDB Couchbase Server	3	3442	July 9, 2013
How to bulk import design docs and views to couchbase server Couchbase Server	3	3133	January 7, 2016
Need to insert millions of json documents using cbimport Couchbase Server	2	699	February 21, 2020

How are developers importing millions of documents to Couchbase?

Related topics