Merging from:
I don’t know if \set -max_parallelism 8;
is “saved” on the server or for the season, but it seemed after setting it, two different cbq instances “shared” it. Could you tell me which is the case?
Also, these settings fail: (First time I’ve seen so much red on a result!)
cbq> \set -pipeline_batch 256k;
cbq> \set -pipeline_cap 256k;
"errors": [
{
"code": 1065,
"msg": "Unrecognized parameter in request: pipeline_batch"
}
],
4x CPU server, both with \set -max_parallelism 8;
and \set -max_parallelism 4;
seems well-saturated when querying (with htop reporting minor io-wait CPU time [<3%])
Query format didn’t seem to affect running times too much;
SELECT did, ARRAY_AGG(DISTINCT SUBSTR(t, 0, 10)) AS dates, {
"full" : COUNT(NULLIF(ARRAY_LENGTH(bcn), 0)),
"null" : COUNT(*) - COUNT(NULLIF(ARRAY_LENGTH(bcn), 0)),
"sums" : COUNT(*)
} AS entries
FROM sync_gateway
WHERE type="bcn_scan" AND SUBSTR(t, 0, 10) BETWEEN "2016-11-01" AND "2016-12-15"
GROUP BY did
ORDER BY did ASC;
"metrics": {
"elapsedTime": "56.951166373s",
"executionTime": "56.951118663s",
"resultCount": 8,
"resultSize": 8317,
"sortCount": 8
}
and
SELECT RAW
{
date :
ARRAY_SORT( ARRAY_AGG( counts ) )
}
FROM (
SELECT
SUBSTR(t, 0, 10) AS date,
{
did : {
"foull_entries" : COUNT(NULLIF(ARRAY_LENGTH(bcn), 0)),
"total_entries" : COUNT(*),
"nulld_entries" : COUNT(*) - COUNT(NULLIF(ARRAY_LENGTH(bcn), 0))
}
} AS counts
FROM sync_gateway
WHERE
type="bcn_scan"
AND SUBSTR(t, 0, 10) BETWEEN "2016-11-01" AND "2016-12-15"
AND did IS NOT NULL
GROUP BY SUBSTR(t, 0, 10), did
) AS s
GROUP BY date
ORDER BY date;
"metrics": {
"elapsedTime": "58.590618023s",
"executionTime": "58.590573467s",
"resultCount": 45,
"resultSize": 56924,
"sortCount": 45
}
Seem to yield the same results more or less (~1.5 sec difference). Unless you meant something different with:
to which, I am open to suggestions if you have to propose something even faster (within the N1QL confines of course).
However, COUNT(*) functions seem to perform differently. I am just writing them for the people that may visit the topic.
N.B.!: You have to \set max_parallelism #
to see the ‘small’ differences. Default max_parallelism (I think 1) gives major differences between the 1st and the 2nd/3rd places (whereas in-between 2nd/3rd place, unoptimized, there is no difference).
"full" : COUNT(NULLIF(ARRAY_LENGTH(bcn), 0)),
"null" : COUNT(*) - COUNT(NULLIF(ARRAY_LENGTH(bcn), 0)),
"sums" : COUNT(*)
"metrics": {
"elapsedTime": "56.951166373s",
"executionTime": "56.951118663s",
"resultCount": 8,
"resultSize": 8317,
"sortCount": 8
}
and
"full" : COUNT(NULLIF(ARRAY_LENGTH(bcn), 0)),
"null" : COUNT(CASE WHEN ARRAY_LENGTH(bcn)==0 THEN 1 ELSE NULL END),
"sums" : COUNT(*)
"metrics": {
"elapsedTime": "1m7.820849181s",
"executionTime": "1m7.820799484s",
"resultCount": 8,
"resultSize": 8317,
"sortCount": 8
}
and
"full" : ARRAY_SUM(ARRAY_AGG(TONUMBER(ARRAY_LENGTH(bcn)<>0))),
"null" : ARRAY_SUM(ARRAY_AGG(TONUMBER(ARRAY_LENGTH(bcn)==0))),
"sums" : COUNT(*)
"metrics": {
"elapsedTime": "1m8.286570522s",
"executionTime": "1m8.286494713s",
"resultCount": 8,
"resultSize": 8317,
"sortCount": 8
}