Load bucket data into json file

I have a requirement to load buckets data into multiple JSON files with 100 documents in each file.

I have tried cbtransfer but it copies the into csv file as a whole server specific, which contains duplicate records.

can someone suggest how I can achieve this.

You could, if you have a Query node, use the REST API with an ordered SELECT * query with OFFSET and LIMIT clauses to segment your data.

If you are on Linux/MacOS and you have a tool such as ā€˜jqā€™ installed to trim additional output, a script like this would achieve your aim:

$ cat t.sh
S="SELECT count(1) c FROM \`travel-sample\` t"
C=`curl -su Administrator:password -d "metrics=false&statement=${S}" http://localhost:8093/query/service| jq .results[0].c`
for ((i=0;i<$C;i+=100))
    curl -su Administrator:password -d "metrics=false&statement=SELECT t.* FROM \`travel-sample\` t ORDER BY meta().id OFFSET $i LIMIT 100" http://localhost:8093/query/service| jq .results > export_$((i/100)).out

(Iā€™m not saying this is the best way, just a way to achieve your aim - each export file containing an anonymous array of documents.)


You could try the following two commands

cbexport json -c couchbase:// -u $CB_USERNAME -p $CB_PASSWORD -f lines -b source_bucket -o all_output.json

split -d -a 10 -l 100 all_output.json

I think you can avoid the intermediate file all_output.json you can do the following if your system supports /dev/stdout and use a pipeline

cbexport json -c couchbase:// -u $CB_USERNAME -p $CB_PASSWORD -f lines -b source_bucket -o /dev/stdout | egrep -v '(^$)' | split -d -a 10 -l 100

You can also speed things up if you have lots of CPU cores by adding -t 16 to the cbexport command.