- We observed that the N1QL query takes over a minute to retrieve approximately 200,000 documents.
*To improve performance, we created a covering index, which significantly reduced the execution time.
*Since reducing the number of columns in the query is not an option, are there any optimization recommendations you suggest we try before creating a covering index, considering the associated overhead?
*The stream phase time remains high even with the covering index in place, although it is slightly reduced compared to the non-covering index scenario.
Can you show the query and the execution metrics? Can you show the client code that calls the query and consumes the stream?
Could it be the client that is slowing the streaming? Could it be the network? The server cannot produce the stream faster than the the client consumes it.
Do you have a ticket open with support? If so, what is the ticket number?
CE query service limited by 4 cores.
Best way is create index covers predicates and use covering query project the document keys and use KV get (async, parallel) If need only few fields use SUBDOC API
stream phase time high means client might not reading or check upstream operator times where bottle neck or post profile timings
https://www.couchbase.com/blog/optimize-n1ql-performance-using-request-profiling/
Hi @mereiche,
Thank you for the help.
Query_CI_WithoutCI.zip (9.1 KB)
Can you show the query and the execution metrics? Can you show the client code that calls the query and consumes the stream?
Attached the sheet having a query with CI (covering index) and without CI, where the stream takes time. Refer to Column AE with CI and column AB without CI
Could it be the client that is slowing the streaming? Could it be the network? The server cannot produce the stream faster than the the client consumes it.
We need to understand why is streaming takes time, then optimize stream time.
Do you have a ticket open with support? If so, what is the ticket number?
Raised ticket with Couchbase support. ticker number #70847
Hi @vsr1 ,
Thanks for getting back so quickly.
We are verifying whether this is currently enabled, as mentioned in the blog.
Stream: high execution time
The request is struggling to convert the result in JSON form and send it to the client.
The pretty
REST parameter may have been set to true
in error.
Looked at the support case.
client seems libcouchbase and SDKs so pretty will be false by default. So that is not the issue.
Some of quick observations: This may be different request than what you looking
-
resultSize are in Mbs
-
Stream execTime is high, Two possibilities client is not reading fast (resultSize is very high) or upstream Operator in this case Order is issue. Order operator execTime low and kernTime high then look upstream InitialProject execTime high
That tells InitialProject is taking time, that is due to number of projection list and number of items qualified are high.
"#operator": "InitialProject",
"#stats": {
"#itemsIn": 207794,
"#itemsOut": 207794,
"#phaseSwitches": 831180,
"execTime": "40.064461391s",
"kernTime": "1.709503988s"
},
"#operator": "Order",
"#stats": {
"#itemsIn": 207794,
"#itemsOut": 193215,
"#phaseSwitches": 608808,
"execTime": "2.637306576s",
"kernTime": "41.692093248s"
},
"#operator": "Stream",
"#itemsIn": 193215,
"#itemsOut": 193215,
"#phaseSwitches": 193218,
"execTime": "32.312172445s",
"kernTime": "2.364µs"
-
Some requests timedout due to not optimized If you look IndexScan count very high and after Filter the count is very low that causes lot of wasted time
-
Some requests i have seen FetchTime very high which uses earlyProjection in fetch https://jira.issues.couchbase.com/browse/MB-66614
Hi @vsr1
Thanks for the update.
- resultSize are in Mbs
- Stream execTime is high, Two possibilities client is not reading fast (resultSize is very high) or upstream Operator in this case Order is issue. Order operator execTime low and kernTime high then look upstream InitialProject execTime high
That tells InitialProject is taking time, that is due to number of projection list and number of items qualified are high.
For points 0 and 1, we could not alter the projection list or result size
- Some requests timedout due to not optimized If you look IndexScan count very high and after Filter the count is very low that causes lot of wasted time
Is this observation before creating a covering index? This is the area we look for optimization.
- Some requests i have seen FetchTime very high which uses earlyProjection in fetch https://jira.issues.couchbase.com/browse/MB-66614
Is this also before creating a covering index
If query results count high and result size is high want faster best option will be
non aggegate/join queries(subqueries etc) will be
change Query project document key only make that query covered. Once you have document key
get document from KV through SDK asycnhronous/parallel (This avoids index leaner, avoids Fetch via 2 hops (KV to N1QL to client). I know this required application change and may not convenient.
thanks @vsr1
This is indeed a simple query — no joins, subqueries, or aggregates — but the number of documents returned is quite large (~210k), and each document includes several fields. There’s an ORDER BY
clause, and we did test the query without it, but that didn’t significantly reduce the time; Stream phase continues to be the one where most of the time is spent.
The workaround you suggested is definitely worth exploring, though it comes with some complexity. However, before heading down that path, we’re trying to confirm whether the time spent in the Stream phase is justified — i.e., whether there’s any network bottleneck or client-side slowness (SDK or app logic). That’s been difficult to isolate - any suggestions here would be helpful.
In parallel, we’re looking into tuning options to help the Stream phase complete faster. We came across a few parameters in the documentation (https://docs.couchbase.com/server/current/n1ql/n1ql-manage/query-settings.html# )
pipeline-cap
scan-cap
pipeline-batch
Would tweaking these help in this scenario, or are they unlikely to have much impact when streaming large result sets?
Appreciate any insights you might have.
Can you show the client code that calls the query and consumes the stream?
Could it be the client that is slowing the streaming? Could it be the network? The server cannot
produce the stream faster than the the client consumes it.
still waiting for the client code.
The measured network bandwith would be useful and the size of the documents would be useful (i.e. if you try to stream 100 GB on a 1 Mb/s network, it’s going to be slow).
From the ticket, it says the result size 219 MB. If you can get 100 Mbit sustained from server to the client, that is going to take about 20 seconds.
If you want to eliminate the client, you can execute the query with curl using the REST API…
curl -v http://10.143.210.101:8093/query/service -u Administrator:password -d 'statement=SELECT meta().id FROM my_collection WHERE type = "hotel" and meta().id LIKE $1 &args=["hotel_1002%25"]&query_context=default:travel-sample.my_scope' -o result.out
Those unlikely improve stream
This is not 100% proof some of the technique i use so that avoid client transfer huge data let query process rest of pipeline
SELECT COUNT(1) FROM (actual query) AS t;
SELECT 1 FROM (actual query ) AS t WHERE t.xxx IS NOT NULL;
thanks @mreiche, on our side it would be 1Gb/s nw (but let us get that re-verified), so even after consdering the protocol overheads, it should not have exceeded 5 sec.
will get the client code. thanks
FYI - when one has a “1Gb/s” network, the actual achievable throughput is much less. Maybe 50%, maybe 10% of that “1Gb/s”.
Idel system with only query running
UPSERT INTO default (KEY d.k, VALUE d)
SELECT {"k":TO_STR(t), "c1": LPAD("a",1024,"b")} AS d FROM ARRAY_RANGE(0,200000) AS t;
SELECT RAW t FROM default AS t;
time curl -u Administrator:password http://localhost:8093/query/service -H "Content-Type: application/json" -d '{"statement":"SELECT RAW t FROM default AS t", "pretty"false}' > /dev/null
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 199M 0 199M 100 62 21.1M 6 0:00:10 0:00:09 0:00:01 21.5M
real 0m9.433s
user 0m0.200s
sys 0m0.285s
Hi @mreiche, We have asked the developer for the code and here it is.
int CCouchbaseDB::ExecuteQuery(KeyValueDBQuery *qry, CCallBackDetailParams *userdata, SKeyValueDBParams *parameters)
{
if(!m_connected)
return KVDB_CONNECTION_FAILED;
Query *query = (Query *)qry;
if(!query->m_queryHandle)
return KVDB_INVALID_PARAMS;
userdata->cmd = query->m_queryHandle ;
lcb_STATUS rc;
if((rc = lcb_cmdquery_callback(query->m_queryHandle, query_callback_param)) != LCB_SUCCESS)
{
CLogger::Error(__FILE__, __LINE__, "Failed to make n1ql command. Error[%s]", GetErrorDesc(rc));
return rc;
}
if((rc = lcb_query(m_instance, userdata, query->m_queryHandle)) != LCB_SUCCESS)
{
CLogger::Error(__FILE__, __LINE__, "Failed to schedule n1ql query. Error[%s]", GetErrorDesc(rc));
return rc;
}
lcb_wait(m_instance, LCB_WAIT_DEFAULT);
return userdata->rc;
}
Please provide the callback
Hi @mreiche,
Thank you for your patience as we share the callback code.
static void query_callback_param(lcb_INSTANCE* instance, int type, const lcb_RESPQUERY *resp)
{
CCallBackDetailParams *result = NULL ;
CSTR row = NULL ;
size_t row_len = 0 ;
lcb_STATUS rc = lcb_respquery_status(resp);
lcb_respquery_cookie(resp, (void**)(&result)) ;
lcb_respquery_row(resp, &row, &row_len);
if(lcb_respquery_is_final(resp))
{
// In final success response, only couchbase status will be received in below format, hence not calling HandleDocument.
/* In timeout scanerio, status will be closed/timeout and partial record may return which we are not passing to application.
* [{
* "requestID": "58bdd66c-bdee-45cf-b226-8cc4beb08f52",
* "clientContextID": "6a14c164e6fa3c7d",
* "signature": {"*":"*"},
* "results": [
* ],
* "status": "success",
* "metrics": {"elapsedTime": "2.287467ms","executionTime": "2.134303ms","resultCount": 4,"resultSize": 18557}
* }]
*/
if (rc != LCB_SUCCESS)
{
// logging error when return code will be other than success. Same will be send to application.
CLogger::Error (__FILE__, __LINE__, "Error returned by Couchbase [0x%x], row_len[%d]", rc, (int)row_len) ;
if ((int)row_len > 0)
{
char buf[5000] = {0};
int len = (row_len > 5000) ? 5000 : (int)row_len ;
STRNCPY(buf, row, len) ;
CLogger::Error(__FILE__, __LINE__, "Error returned by Couchbase while executing query[%s]", buf) ;
}
}
// When no document is present, CB returns success, use internal code to send to application to handle no document.
if(result->nRec == 0 && rc == LCB_SUCCESS )
result->rc = KVDB_LCB_KEY_ENOENT ;
else
{
result->rc = rc ;
}
}
else
{
BOOL res = result->HandleDocument(row, (int)row_len, (int)rc, result->userArg, result); /* Sends error description to applications */
result->nRec++;
if(res != TRUE)
{
char buf[5000] = {0};
int len = (row_len > 5000) ? 5000 : (int)row_len ;
STRNCPY(buf, row, len) ;
CLogger::Info(__FILE__, __LINE__, "Error found while parsing document [%s]", buf) ;
lcb_QUERY_HANDLE* handle;
lcb_respquery_handle(resp, &handle);
lcb_query_cancel(instance, handle);
result->rc = LCB_ERR_GENERIC;
}
}
}
@mreiche, Do you happen to see anything off with the callback or the part of the code that invokes the query?
What’s puzzling is that the exact same query, when run via cbq on the same machine, completes in about 11–12 seconds — but through the app using the embedded C SDK, it takes around 35–37 seconds. So at least for now, we’ve ruled out any network-related issues. It seems more likely that the delay is within the SDK or the app logic itself.
Would appreciate your thoughts.
@pccb by any chance is it a query that you provide parameters for through the SDK, but hardcode those parameters into the statement in cbq?
This has been a past cause of ‘SDK is slower than UI/cbq’ issues before, e.g. see Query is fast in couchbase query window but slow in C# code - #9 by btburnett3
thanks @graham.pople , we verified, in both case the same index is used.