We have a file of approximately 270 MB containing around 1600 INSERT statements, each inserting a single JSON document. Each document is relatively large and includes arrays. Executing this file through cbq takes about 45 minutes, which seems excessively slow.
Here’s what we’ve tried so far:
- Parallel Execution: Splitting the file and running multiple
cbqsessions in parallel reduced execution time, but productionizing this approach would require additional time and effort. - Batch Inserts: We attempted batching by inserting 10 documents in a single
INSERTstatement, but this didn’t improve the performance. - Using
cbimport: Whilecbimportis significantly faster, it requires the file to be in JSON format, which involves additional preprocessing. - Ruling Out Network and Hardware Constraints:
- To eliminate network overhead, we ran the N1QL queries directly on the machine hosting Couchbase Server.
- We placed the file on a RAM disk to minimize disk I/O overhead.
Despite these efforts, the performance improvement has been minimal. We are keen to understand why processing takes so long. The documents are large, but not excessively so to justify such delays.
Is there any parameter or option in cbq that could help speed up the process? We’ve already tried disabling logging to stdout and other logs, but that didn’t help. Any insights would be greatly appreciated!