Using other formats than JSON
Hi,
Would it be easy to replace the JSON decoding/encoding of documents before they reach the view engine?
I've experimented with profiling my application and found out that it spent about 20-40% cycles just parsing JSON in my memcached driver (for nodejs). And there is a lot of traffic between the application and couchbase.
I've done some measurements serializing my data using msgpack instead and have seen about 55% of the original size of the serialized JSON. My documents have a lot of time stamps and arrays (which is better suited for msgpack).
I don't know about the speed of the parser for msgpack. But the cut in size would alone be worth the effort.
Where is the JSON parsing done? Is it done before reaching the javascript engine or would it be possible to intercept the parsing when reaching the javascript engine and parse other formats instead of JSON?
What are the files /opt/couchbase/share/couchdb/server/main.js and /opt/couchbase/share/couchdb/server/main-coffee.js used for? Would it be enough to replace main.js with an other parser?
I would really like to try experimenting using more efficient serializers.
Peter
This is actually not true.
If some document is not valid JSON (like some binary format for e.g.), the indexer still processes it.
In this case the map function will have a meta.type set to "base64" and the doc field passed to the map function is of type string with a content which corresponds to the base64 encoded version of the document.
The only case where documents are skipped (and an explicit message lis ogged to mention this), is when they have an ID which is not UTF-8.
I'm not totally sure I understand exactly what your trying to do. Are you saying that the Couchbase Server memcached process is spending a lot of CPU time in json checking functions. If that's the issue the you can change the source code and the line of code you need to look at is here:
https://github.com/couchbase/ep-engine/blob/master/src/couch-kvstore/cou...
What we do in the memcached process is to check to see if the value in the kv-pair is actually json. If it is then we store it on disk with a flag that marks it as valid json. Later the view engine will need to update all of the indexes and it will read all of the latest entries on disk. If the flag for an entry is denoted that the value is non-json then the indexer will skip it.
If your trying to do something else then let me know and I can try to help. Other than that let me know if you have any other questions so I can help you.