COUNT condition: INDEX Speed-up improvements

Merging from:

I don’t know if \set -max_parallelism 8; is “saved” on the server or for the season, but it seemed after setting it, two different cbq instances “shared” it. Could you tell me which is the case?

Also, these settings fail: (First time I’ve seen so much red on a result!)

cbq> \set -pipeline_batch 256k;
cbq> \set -pipeline_cap 256k;

    "errors": [
        {
            "code": 1065,
            "msg": "Unrecognized parameter in request: pipeline_batch"
        }
    ],

4x CPU server, both with \set -max_parallelism 8; and \set -max_parallelism 4; seems well-saturated when querying (with htop reporting minor io-wait CPU time [<3%])

Query format didn’t seem to affect running times too much;

SELECT did, ARRAY_AGG(DISTINCT SUBSTR(t, 0, 10)) AS dates, {
	"full" : COUNT(NULLIF(ARRAY_LENGTH(bcn), 0)),
	"null" : COUNT(*) - COUNT(NULLIF(ARRAY_LENGTH(bcn), 0)),
	"sums" : COUNT(*)
} AS entries
FROM sync_gateway
WHERE type="bcn_scan" AND SUBSTR(t, 0, 10) BETWEEN "2016-11-01" AND "2016-12-15"
GROUP BY did
ORDER BY did ASC;

"metrics": {
    "elapsedTime": "56.951166373s",
    "executionTime": "56.951118663s",
    "resultCount": 8,
    "resultSize": 8317,
    "sortCount": 8
}

and

SELECT RAW
    {
        date :
        ARRAY_SORT( ARRAY_AGG( counts ) )
    }
FROM (
    SELECT
        SUBSTR(t, 0, 10) AS date,
        {
            did : {
                "foull_entries" : COUNT(NULLIF(ARRAY_LENGTH(bcn), 0)),
                "total_entries" : COUNT(*),
                "nulld_entries" : COUNT(*) - COUNT(NULLIF(ARRAY_LENGTH(bcn), 0))
            }
        } AS counts
    FROM sync_gateway
    WHERE
        type="bcn_scan"
        AND SUBSTR(t, 0, 10) BETWEEN "2016-11-01" AND "2016-12-15"
        AND did IS NOT NULL
    GROUP BY SUBSTR(t, 0, 10), did
) AS s
GROUP BY date
ORDER BY date;


"metrics": {
    "elapsedTime": "58.590618023s",
    "executionTime": "58.590573467s",
    "resultCount": 45,
    "resultSize": 56924,
    "sortCount": 45
}

Seem to yield the same results more or less (~1.5 sec difference). Unless you meant something different with:

to which, I am open to suggestions if you have to propose something even faster (within the N1QL confines of course).


However, COUNT(*) functions seem to perform differently. I am just writing them for the people that may visit the topic.

N.B.!: You have to \set max_parallelism # to see the ‘small’ differences. Default max_parallelism (I think 1) gives major differences between the 1st and the 2nd/3rd places (whereas in-between 2nd/3rd place, unoptimized, there is no difference).

"full" : COUNT(NULLIF(ARRAY_LENGTH(bcn), 0)),
"null" : COUNT(*) - COUNT(NULLIF(ARRAY_LENGTH(bcn), 0)),
"sums" : COUNT(*)

"metrics": {
    "elapsedTime": "56.951166373s",
    "executionTime": "56.951118663s",
    "resultCount": 8,
    "resultSize": 8317,
    "sortCount": 8
}

and

"full" : COUNT(NULLIF(ARRAY_LENGTH(bcn), 0)),
"null" : COUNT(CASE WHEN ARRAY_LENGTH(bcn)==0 THEN 1 ELSE NULL END),
"sums" : COUNT(*)

"metrics": {
    "elapsedTime": "1m7.820849181s",
    "executionTime": "1m7.820799484s",
    "resultCount": 8,
    "resultSize": 8317,
    "sortCount": 8
}

and

"full" : ARRAY_SUM(ARRAY_AGG(TONUMBER(ARRAY_LENGTH(bcn)<>0))), 
"null" : ARRAY_SUM(ARRAY_AGG(TONUMBER(ARRAY_LENGTH(bcn)==0))),
"sums" : COUNT(*)

"metrics": {
    "elapsedTime": "1m8.286570522s",
    "executionTime": "1m8.286494713s",
    "resultCount": 8,
    "resultSize": 8317,
    "sortCount": 8
}