Data Transfer out of Couchbase Server really slow ?
I'm having a strange problem with the speed of data transfer out of Couchbase.
I've tried various different configurations but here is my current setup:
5-Node cluster on EC2.
All Ubuntu 13.10 with 8Gb RAM each.
Approx 4m documents in the bucket (not the default bucket)
A View (map/reduce) which returns approx 800,000 records.
When I query the View with reduce=true I get the 'count' which is approx 800,000 - this is returned instantly. So this tells me that the View itself is executing extremely quickly.
When I query the View with reduce=false I get each of the 800,000 lines. I've done the calculations and the data transfer amounts to around 60 Mb of data.
THIS IS TAKING BETWEEN 20-35 SECONDS.
I've installed iperf on all machines and tested the network transfer rate to be anywhere from 100 Mb/sec to 1000 Mb/sec depending on which servers I'm using. This is megabytes per second not megabits. So transferring 60 Mb of data should not be a problem but it is taking about 30 seconds.
This is completely killing my use case.
I've independently (not using iperf) confirmed that transferring 60 Mb of a file takes less than a second but when I query the View which returns 60Mb worth of data it takes 30 seconds and the CPU usage of the Node in question is about 60% for the full 30 seconds.
What's going on ?
Your help would be greatly appreciated - I'm completely stuck with this.
Let me answer in 2 steps:
1- Generic view processing
I do not necessary see that like a "product issue", it is like doing a large query on a relational database... Query in Couchbase, it depends of many things:
- how to access the data, query itself, load data from disk, cache things, ...
- network is a SMALL thing in this
When you try to return a large dataset from a Couchbase view, the index /result will come from disk and be cache in the OS Page cache, and it is not possible to cache the whole thing. Then the system has to merge all the result of all node, and also sort it..
So all these operations are taking lot of time.
2- Your use case
What do you emit in your view/index?
Be sure you do not emit lot of data, only the one necessary for your query to return the data that you really need in the application.
What I mean by that is, if you want to return the document itself it is better to emit only the id (already part of the index) and do get (or multiget) from your application, in this case the document will be cache in Couchbase cache (memcached)