Loading json into buckets

ogrdsnielsen · November 8, 2016, 12:59pm

Hi,

I am having one file which will contain multiple jsons seperated by comma.

No i want to load the json1 from a file to a bucket x(creating document as y)and the same file(json 2) to the same bucket x(but to an another document z).

Using cbcdocloader we can load data into bucket of a file which is having only one json but not multiple jsons.

Can anyone please help me to find the solution of loading the data into bucket of multiple jsons(same file) to multiple documents.

geraldss · November 8, 2016, 3:31pm

You can probably use a script with N1QL INSERT.

daschl · November 8, 2016, 3:51pm

@ogrdsnielsen as gerald said I think you won’t get around a simple script, or in bash you split it up into multiple docs first. That said if you use a language where we have official SDKs you’d be better off using KV directly since it gives you better performance on those kind of operations (insert where you know the key and the value)

geraldss · November 8, 2016, 4:48pm

The performance should be similar with N1QL bulk INSERT.

ogrdsnielsen · November 9, 2016, 10:00am

But we are planning for millions of jsons(app 4 million).

For example we want to load 400 files and each file consists of 10,000 jsons max.

If i use N1QL it will be difficult to insert manually that many jsons.

avsej · November 9, 2016, 2:22pm

why not just write convertor to cbdocloader format? cbdocloader cannot support any kind of possible formatting your documents.

Lets say you have test.json with the following content:

$ cat tmp.json
{"foo":"bar"},{"baz":"quux"}

With simple one liner you can split it into multiple documents

$ ruby -rjson -e 'file="tmp.json"; docs = JSON.load("[" + File.read(file) + "]"); docs.each_with_index{|d,i| File.write(file.sub(".json", "-#{i}.json"), JSON.dump(d))}'
$ cat tmp-0.json 
{"foo":"bar"}
$ cat tmp-1.json 
{"baz":"quux"}

The script is really trivial:

file="tmp.json"
docs = JSON.load("[" + File.read(file) + "]")
docs.each_with_index do |d,i|
  File.write(file.sub(".json", "-#{i}.json"), JSON.dump(d))}
end

You only need to put results into docs/ directory and zip the all. For example you can take a look at how travel-sample dataset has been created

github.com

couchbase/couchbase-examples/blob/e06375b622ad04ffc8d4939f75c5683519df6f03/generate-travel-sample.rb#L561


File.write("#{dir}/design_docs/indexes.json", n1ql_indexes.to_json)
end


%w(travel travel-sample).each do |dir|
rm("#{dir}.zip") if File.exist?("#{dir}.zip")
puts("set mtime of all files to #{GLOBAL_MTIME}...")
Dir['**/*']
  .sort_by { |f| [File.directory?(f) ? 1 : 0, f] }
  .map { |f| FileUtils.touch(f, mtime: GLOBAL_MTIME) }
puts("archiving to #{dir}.zip...")
system("zip -9rqX #{dir}.zip #{dir}")
rm_rf(dir)
end

ogrdsnielsen · November 10, 2016, 6:45am

Thanq @avsej…But we are already using the syntax for splitting the jsons into multiple files.

But there was some problem in unix box like limiting the number of files.

Suppose for example our table has data for some 20million records and when we are running the script for generating the jsons only 4million+ jsons are getting created in unix box.Is there any threshold limit in unix?

Is there any solution to overcome that and generate 20million jsons in the unix box.

avsej · November 10, 2016, 12:00pm

In this case, why don’t you use regular SDK to load the documents from that huge file? You might use streaming JSON parser (which does not load full file into memory to parse) and then upsert all the docs.

Topic		Replies	Views
Load bucket data into json file Couchbase Server	2	842	April 5, 2022
Loading a JSON file with 1000 documents into a new Database Couchbase Server	2	4114	February 10, 2018
Unable to bulk load JSON Couchbase Server	4	2849	July 14, 2015
Import a huge JSON File Couchbase Server	6	3242	March 5, 2016
Insert JSON file in PHP PHP SDK	2	2336	July 13, 2015

Loading json into buckets

Related topics