Couchbase performance - real tests - it's horrible - or am I doing somethin wrong?
Hi,
I was tempted by your performance and speed claims about Couchbase. So I did tests. And I'm confused - what's wrong? The speed is horrible. Why do you claim so high speed and I can't observe it in my environment?
Here's what I did:
- created one bucket on 2 node cluster: two windows 2008 production, high performance servers.
- inserted 2.7M documents - very simple JSON. One JSON value is about 100-300 bytes total.
example of one document (key=f_5691_3110):
{
"_id": "f_5691_3110",
"_rev": "1-d51662ecd11912f04cdf1d9778f4512d",
"$flags": 0,
"$expiration": 0,
"i": 5691,
"m": "20111126095304",
"o": 3110,
"u": 0,
"t": "f",
"v": "1999_5_20_8419_51"
}- added one view to one design document. the view is executed/created/initialized before tests of course:
function (doc) { emit([doc.t,doc.i,doc.o], null); }
- used for tests python 2.6 client on separate machine. I am using profiling environment so I can say exactly which line of code and how much is slow.
- long story short - I got very low speed. I decided to use only one node to better identify the problem. So from now there's one node cluster with 2.7M simple documents. Allocated total memory 500MB, in use 335MB, unused 164MB. Disk in use 4.15GB. Bucket "icor_dev" - RAM usage: 335 MB, RAM Quota: 400MB, item count 2753333.
- at first I thought it's a problem with your python client code. After profiling it occured, that the socket.py implementation of method readline is very slow. So I used urllib.urlopen and direct URLs, which should be a little faster. Here's example test code and results:
def TestSpeed(): t1=time.clock() l2=[] atext=urllib.urlopen('http://10.10.250.137:8091/couchBase/icor_dev/_design/by_id/_view/by_id?limit=1&startkey=["c",258]').read() d=json.loads(atext) aoid=d['rows'][0]['key'][2] while aoid>=0: l2.append(aoid) atext=urllib.urlopen('http://10.10.250.137:8091/couchBase/icor_dev/_design/by_id/_view/by_id?limit=1&startkey=["c",258,%d]&endkey=["c",259]&skip=1'%(aoid,)).read() d=json.loads(atext) if d['rows']: aoid=d['rows'][0]['key'][2] else: aoid=-1 t2=time.clock() print len(l2),t2-t1,(t2-t1)/len(l2)*1.0
639 20.0586746862 0.0313907272085
So, on fast machine, fast server, fast network it tooks 20 seconds to get 640 simple requests. Its 0.03 sec for one request. Awful.
- I thought it maybe still be a problem with network communication, so I used simple socket connection, just to get results:
def GetText(ahost,aport,afilename): c=socket.socket(socket.AF_INET,socket.SOCK_STREAM) c.connect((ahost,aport)) fileobj=c.makefile('r',0) fileobj.write("GET "+afilename+" HTTP/1.0\n\n") atext=fileobj.read() apos=atext.find('\r\n\r\n') c.close() return atext[apos+4:] def TestSpeed5(): t1=time.clock() l2=[] atext=GetText('10.10.250.137',8091,'/couchBase/icor_dev/_design/by_id/_view/by_id?limit=1&startkey=["c",258]') d=json.loads(atext) aoid=d['rows'][0]['key'][2] while aoid>=0: l2.append(aoid) atext=GetText('10.10.250.137',8091,'/couchBase/icor_dev/_design/by_id/_view/by_id?limit=1&startkey=["c",258,%d]&endkey=["c",259]&skip=1'%(aoid,)) d=json.loads(atext) if d['rows']: aoid=d['rows'][0]['key'][2] else: aoid=-1 t2=time.clock() print len(l2),t2-t1,(t2-t1)/len(l2)*1.0
639 18.9517606327 0.0296584673438
640 requests to Couchbase and "only" 18 seconds. Wow.
- i decided to test other communication: by keys, simpler query and just communication to server on port 80 to IIS WWW Server for baseline comparison. Here's code and results:
def TestSpeed6(): t1=time.clock() amax=640 for i in range(amax): #atext=GetText('10.10.250.137',80,'/') #atext=GetText('10.10.250.137',8091,'/couchBase/icor_dev/f_5691_3110') atext=GetText('10.10.250.137',8091,'/couchBase/icor_dev/_design/by_id/_view/by_id?limit=1&startkey=["c",258]') #atext=urllib.urlopen('http://10.10.250.137:8091/couchBase/icor_dev/f_5691_3110').read() t2=time.clock() print amax,t2-t1,(t2-t1)/amax*1.0
for: atext=GetText('10.10.250.137',8091,'/couchBase/icor_dev/_design/by_id/_view/by_id?limit=1&startkey=["c",258]')
it is:
640 17.8243509825 0.0278505484102
just as beforefor: atext=GetText('10.10.250.137',8091,'/couchBase/icor_dev/f_5691_3110')
it is:
640 4.19659155819 0.00655717430968
so, key retrieval is faster but still painfully slow. is it possible, that your views are just slow?for: atext=urllib.urlopen('http://10.10.250.137:8091/couchBase/icor_dev/f_5691_3110').read()
it is:
640 5.7671038684 0.00901109979437
so yeah, urllib and httplib in python library, even with slow readline in socket.py are not that slow afterwards.and finally - the connection to IIS on port 80:
for: atext=GetText('10.10.250.137',80,'/')
640 0.962828127135 0.00150441894865
I WANT THAT SPEED FROM COUCHBASE! At least :-)- but thats not all. I took those tests on server console and with local interface to measure you internal timings. Here's results:
atext=GetText('127.0.0.1',8091,'/couchBase/icor_dev/_design/by_id/_view/by_id?limit=1&startkey=["c",258]')
640 18.9538473699 0.029615386515518 seconds on local interface?!? almost no difference from communication with remote machine. it can't be a slow network problem.
atext=GetText('127.0.0.1',8091,'/couchBase/icor_dev/f_5691_3110')
640 4.23714760273 0.00662054312927 atext=urllib.urlopen('http://127.0.0.1:8091/couchBase/icor_dev/f_5691_3110').read()
640 6.12888052745 0.00957637582415 atext=GetText('127.0.0.1',80,'/')
640 0.555627680149 0.000868168250233
YEAH! I WANT *THAT* SPEED :-)Well, I tested in similar environment and with similar data other products: MS SQL, mySQL, Postgres, MongoDB and Redis. Couchbase is slowest. On the other hand, I like the Couchbase the most. For its design and architecture. If I would write something similar I imagine I would do same choices. So, how can I help you to test this product? As for now I could tell that there may be a problem with your network code, maybe not reusing connections or something, maybe serialization to text JSON is slow, but the most suspicious is internal data and memory management and index querying. Can we do some timings with that?
What constitutes a "large" cluster?
Note that we released Couchbase Server 2.0 Developer Preview 3 on December 13. There were a number of performance enhancements in there.
Having the same issue on Couchbase 1.8
The original post was about poor performance over HTTP, it'd be impossible to have exactly the same problem on 1.8. What are you observing? I'd like to help.
I already found problem. It was not in Couchbase(at least directly). On my cluster we have two network interface one for local network and one for connection with internet(we have few servers that working in different DC). On local network we have 1Tbite chanel and for internet 100Mbit. Problem was in Couchbase cluster server list. Because it handle connection only on one interface and can't understand when client in local network. So when Moxi proxy trying to connect to Couchbase server through local network, Couchbase return internet IP's of servers and due small channel we have problem with response time.
I think Couchbase need configuration that can work with few network interfaces.
Has there been progress on this ("Expect to hear more about these data structures in blog posts etc")? With 2.0.0dp4 I see similar numbers per document but I have larger documents, roughly a 1-4k. Similar to above, simple web request for a block do documents from the same view is an order of magnitude faster so my guess is that whatever the problem, it is unrelated to optimized views on large clusters. One thing I have noticed is that the Java client consumes nearly all CPU as soon as it is created which might indicate a tight loop. Setting nice to the lowest priority for the process seems to help with inserts but not with gets
This idiom may also have something to do with improved writes. Without getting the future, the bucket memory fills and inserts seem to be ignored. With the get inserts happen with minimal delays and there is no delay from calling client.get
future = client.set(key, 0, json)
future.get(60, TimeUnit.SECONDS)
I tried to take a closer look at the client, but failed building it Is there a guide to building the Java client. I attempted to build 1.1dp from git with Netbeans, but spymemcached-test 2.8.1 seems to be missing or not where expected?
Thanks for the detailed message. I'm sure we'll be looking closely at it.
One known issue with the current developer preview of Couchbase is that it slows down on smaller clusters. The views are optimal on very large clusters.
A big part of the technical effort that has gone into the upcoming release, is in avoiding the slowdown on small clusters. To do this we merge data structures on disk to cut down on the number of disk seeks required for reading views.
Expect to hear more about these data structures in blog posts etc. The upcoming release should be out soon so you can test view performance on smaller clusters.