Hi,
We would like to based our future product on couchbase - but no matter what we are doing the query perfomance is very slow…
Im running a single node on a server with 40 logical processors and 128 GB with fast HDDs.
I have a single bucket with only 300K documents (our target is 10B per customer). Each document contains 20 integers and 10kb image file (as byte array).
I setup primary and secondary indexes:
- CREATE PRIMARY INDEX
#primary
ONsuspectentity
USING GSI - CREATE INDEX
IndexBasic
ONsuspectentity
(time
,channelId
) USING GSI
I setup 20GB RAM for bucket and 100GB RAM for indexes.
I query only by time range and channelId with max results of 10k documents. for example:
SELECT * from suspectentity where time >= ‘2016-01-31T09:00’ and time <= ‘2016-01-31T13:00’ and channelId=12 LIMIT 10000;
The EXPLAIN shows that CB uses the correct index scan:
cbq.exe> EXPLAIN SELECT * from suspectentity where time >= ‘2016-01-31T09:00’ and time <= ‘2016-01-31T13:00’ and channelId=12 LIMIT 10000;
{
“requestID”: “550a42a7-1691-40af-9956-de92d0fd746e”,
“signature”: “json”,
“results”: [
{
“#operator”: “Sequence”,
“~children”: [
{
“#operator”: “Sequence”,
“~children”: [
{
“#operator”: “IndexScan”,
“index”: “IndexBasic”,
“keyspace”: “suspectentity”,
“namespace”: “default”,
“spans”: [
{
“Range”: {
“High”: [
"“2016-01-31T13:00"”,
“12”
],
“Inclusion”: 3,
“Low”: [
"“2016-01-31T09:00"”,
“12”
]
}
}
],
“using”: “gsi”
},
{
“#operator”: “Parallel”,
“~child”: {
“#operator”: “Sequence”,
“~children”: [
{
“#operator”: “Fetch”,
“keyspace”: “suspectentity”,
“namespace”: “default”
},
{
“#operator”: “Filter”,
“condition”: “(((“2016-01-31T09:00” \u003c= (suspectentity
.time
)) and ((suspectentity
.time
) \u003c= “2016-01-31T13:00”)) and ((suspectentity
.channelId
) = 12))”
},
{
“#operator”: “InitialProject”,
“result_terms”: [
{
“expr”: “self”,
“star”: true
}
]
},
{
“#operator”: “FinalProject”
}
]
}
}
]
},
{
“#operator”: “Limit”,
“expr”: “10000”
}
]
}
],
“status”: “success”,
“metrics”: {
“elapsedTime”: “17.0171ms”,
“executionTime”: “17.0171ms”,
“resultCount”: 1,
“resultSize”: 2898
}
}
The query time is more then 60 seconds (for 10K results set), while on MS SQL Server we are getting results very fast (1-3 seconds). Please note that if I TAKE only 10 raws (and not 10K) then the results are 1-2 seconds.
Can anyone please give us support in order to understand the problem or maybe tell us that this is the expected perfomance.