4.5 beta. n1ql group by. too slow

when using group by, query takes about a minute.
i set max pararellism to 12, it takes about ten seconds.
what is tunning point?

@donovan.kim can you please share some sample data from your bucket, the exact query, the result you get and the result you want?

when small amount of data, it’s ok.
i have 100k documents. and three nodes.
i wanna know tunning points.
i think there is no meaning ‘share sample data’.

cbq> SELECT h.tableId, p.sessionId
   > FROM HandHistory AS h USE INDEX (`HANDHISTORY_GPID_IDX`)
   > UNNEST h.players AS p
   > WHERE
   > (any t in h.players satisfies t.userName = "HyPJr6VlyGXCNi3rivWohEmx" end  )and p.userName = "HyPJr6VlyGXCNi3rivWohEmx"
   > AND h.timestamp >= "1462723200000" AND h.timestamp <= "1462809599999"
   > AND h.playType = "1" AND h.gameBuyInType = "2"
   > LIMIT 5;
{
    "requestID": "eebb48fd-d6de-4258-9103-a24b30e3bce7",
    "signature": {
        "sessionId": "json",
        "tableId": "json"
    },
    "results": [
        {
            "sessionId": "3067253",
            "tableId": "0000000a-816f-572a-ca00-00005e080000"
        },
        {
            "sessionId": "3067871",
            "tableId": "0000000a-816f-572a-ca00-00004a080000"
        },
        {
            "sessionId": "3069017",
            "tableId": "0000000a-816f-572a-ca00-0000c6080000"
        },
        {
            "sessionId": "3067580",
            "tableId": "0000000a-816f-572a-ca00-000028070000"
        },
        {
            "sessionId": "3067253",
            "tableId": "0000000a-816f-572a-ca00-00005e080000"
        }
    ],
    "status": "success",
    "metrics": {
        "elapsedTime": "57.778556ms",
        "executionTime": "57.702162ms",
        "resultCount": 5,
        "resultSize": 545
    }
}
cbq> SELECT h.tableId, p.sessionId, max(h.timestamp)
   > FROM HandHistory AS h USE INDEX (`HANDHISTORY_GPID_IDX`)
   > UNNEST h.players AS p
   > WHERE (any t in h.players satisfies t.userName = "HyPJr6VlyGXCNi3rivWohEmx" end  )
   > and p.userName = "HyPJr6VlyGXCNi3rivWohEmx"
   > AND h.timestamp >= "1462723200000" AND h.timestamp <= "1462809599999"
   > AND h.playType = "1" AND h.gameBuyInType = "2"
   > group by h.tableId, p.sessionId
   > LIMIT 5;
{
    "requestID": "464f997f-6e13-4668-927f-67023478ca01",
    "signature": {
        "$1": "json",
        "sessionId": "json",
        "tableId": "json"
    },
    "results": [
        {
            "$1": "1462759673977",
            "sessionId": "3074366",
            "tableId": "0000000a-d5e9-572f-c900-00000a000000"
        },
        {
            "$1": "1462757043772",
            "sessionId": "3074180",
            "tableId": "0000000a-d5e8-572f-ca00-00001b000000"
        },
        {
            "$1": "1462752158520",
            "sessionId": "3073215",
            "tableId": "0000000a-8170-572a-c900-0000920a0000"
        },
        {
            "$1": "1462751991353",
            "sessionId": "3073091",
            "tableId": "0000000a-816f-572a-ca00-0000ec080000"
        },
        {
            "$1": "1462724509560",
            "sessionId": "3067580",
            "tableId": "0000000a-816f-572a-ca00-000028070000"
        }
    ],
    "status": "success",
    "metrics": {
        "elapsedTime": "11.25714192s",
        "executionTime": "11.257070102s",
        "resultCount": 5,
        "resultSize": 720
    }
}
cbq>

i use this option
SimpleN1qlQuery simple = N1qlQuery.simple(stmt, N1qlParams.build().maxParallelism(12));
but it is not so affected.
is this grammar right?

Group by needs to process all the qualified documents even though query has limit clause. If the index is not covered, it requires KV fetch of the documents.

  1. Use covered index and check how it performs.
    ex: CREATE INDEX HANDHISTORY_GROUP_IDX ON HandHistory( distinct (array (t.userName) for t in players end), playType, gameBuyInType, timestamp, players, tableId, sessionId);
  2. If covered index is not possible use index such that it results less keys from indexer and will reduce KV fetch.
  3. max pararellism value depends on query and work load. Try with different maxParallellism values (up to number of CPUs).

@simonbasle Could you please answer SDK question.

@vsr1 I don’t think there is any actual SDK question in this thread, as tuning knobs are more of a query service thing. Of course in the case of query parameters like maxParallelism, the SDK would expose them, as @donovan.kim already found out.
The covering index can be created from cbq/the webconsole (or even as a one shot N1QL query from the SDK…).

I think the question is whether or not the syntax is correct. Yes, @donovan.kim, it is.

You can always verify what’s happening at the lowest levels by turning on TRACE logging.

ok. i will try covering indexing.
btw is it possible composite array indexing?
for example.
if my document’s structure is below.

{ b: ... , inner:[{a: ... , c: ...},...]}

i wanna create index with a,b,c ordered.

create index test_idx on test (distinct array t.a for t in inner end, b, distinct array t.c for t in inner end)

is it ok?

we limit double array expressions in CREATE INDEX but you would do that with something like this

create index test_idx on test (distinct array [t.a, t.c] for t in inner end, b)

thanks
-cihan

i tried to use covering index.

 CREATE INDEX `SESSION_LIST_IDX` ON `HandHistory`((distinct (array (`t`.`userName`) for `t` in `players` end)),`timestamp`,`playType`,`gameType`,`gameBuyInType`,`tableId`,`tableName`,`maxBuyIn`,`minBuyIn`,`tourneyName`,`defaultBuyIn`,`bigBlind`,`smallBlind`)

but covering is not occured. what is problem?

cbq> explain SELECT h.tableId,  h.tableName, h.timestamp, h.playType, h.maxBuyIn,
   > h.minBuyIn, h.tourneyName, h.defaultBuyIn, h.bigBlind, h.smallBlind
   > FROM HandHistory AS h USE INDEX (`SESSION_LIST_IDX`)
   > WHERE ANY t IN h.players SATISFIES t.userName = "htFu39YYTU41iLkMKbKJJtH4" END
   > AND h.timestamp >= "1462723200000" AND h.timestamp <= "1462809599999"
   > AND h.playType = "1" AND h.gameType = "0" AND  ( h.gameBuyInType = "0"  or h.gameBuyInType="1");
{
    "requestID": "6a21c2d1-3f33-4c19-9420-89ee8b0514e5",
    "signature": "json",
    "results": [
        {
            "plan": {
                "#operator": "Sequence",
                "~children": [
                    {
                        "#operator": "DistinctScan",
                        "scan": {
                            "#operator": "IndexScan",
                            "index": "SESSION_LIST_IDX",
                            "index_id": "a155029c740efcc1",
                            "keyspace": "HandHistory",
                            "namespace": "default",
                            "spans": [
                                {
                                    "Range": {
                                        "High": [
                                            "\"htFu39YYTU41iLkMKbKJJtH4\""
                                        ],
                                        "Inclusion": 3,
                                        "Low": [
                                            "\"htFu39YYTU41iLkMKbKJJtH4\""
                                        ]
                                    }
                                },
                                {
                                    "Range": {
                                        "High": [
                                            "\"htFu39YYTU41iLkMKbKJJtH4\""
                                        ],
                                        "Inclusion": 3,
                                        "Low": [
                                            "\"htFu39YYTU41iLkMKbKJJtH4\""
                                        ]
                                    }
                                }
                            ],
                            "using": "gsi"
                        }
                    },
                    {
                        "#operator": "Parallel",
                        "~child": {
                            "#operator": "Sequence",
                            "~children": [
                                {
                                    "#operator": "Fetch",
                                    "as": "h",
                                    "keyspace": "HandHistory",
                                    "namespace": "default"
                                },
                                {
                                    "#operator": "Filter",
                                    "condition": "(((((any `t` in (`h`.`players`) satisfies ((`t`.`userName`) = \"htFu39YYTU41iLkMKbKJJtH4\") end and (\"1462723200000\" \u003c= (`h`.`timestamp`))) and ((`h`.`timestamp`) \u003c= \"1462809599999\")) and ((`h`.`playType`) = \"1\")) and ((`h`.`gameType`) = \"0\")) and (((`h`.`gameBuyInType`) = \"0\") or ((`h`.`gameBuyInType`) = \"1\")))"
                                },
                                {
                                    "#operator": "InitialProject",
                                    "result_terms": [
                                        {
                                            "expr": "(`h`.`tableId`)"
                                        },
                                        {
                                            "expr": "(`h`.`tableName`)"
                                        },
                                        {
                                            "expr": "(`h`.`timestamp`)"
                                        },
                                        {
                                            "expr": "(`h`.`playType`)"
                                        },
                                        {
                                            "expr": "(`h`.`maxBuyIn`)"
                                        },
                                        {
                                            "expr": "(`h`.`minBuyIn`)"
                                        },
                                        {
                                            "expr": "(`h`.`tourneyName`)"
                                        },
                                        {
                                            "expr": "(`h`.`defaultBuyIn`)"
                                        },
                                        {
                                            "expr": "(`h`.`bigBlind`)"
                                        },
                                        {
                                            "expr": "(`h`.`smallBlind`)"
                                        }
                                    ]
                                },
                                {
                                    "#operator": "FinalProject"
                                }
                            ]
                        }
                    }
                ]
            },
            "text": "SELECT h.tableId,  h.tableName, h.timestamp, h.playType, h.maxBuyIn, h.minBuyIn, h.tourneyName, h.defaultBuyIn, h.bigBlind, h.smallBlind FROM HandHistory AS h USE INDEX (`SESSION_LIST_IDX`) WHERE ANY t IN h.players SATISFIES t.userName = \"htFu39YYTU41iLkMKbKJJtH4\" END AND h.timestamp \u003e= \"1462723200000\" AND h.timestamp \u003c= \"1462809599999\" AND h.playType = \"1\" AND h.gameType = \"0\" AND  ( h.gameBuyInType = \"0\"  or h.gameBuyInType=\"1\")"
        }
    ],
    "status": "success",
    "metrics": {
        "elapsedTime": "19.232565ms",
        "executionTime": "19.15695ms",
        "resultCount": 1,
        "resultSize": 5276
    }
}
cbq>

For covering index array element `players` needs to part of the index key

The following index may do better (Due to playType,gameType,gameBuyInType are equal predicates, moved to leading keys in index).

ex : CREATE INDEX SESSION_LIST_IDX ON default((distinct (array (t.userName) for t in players end)),playType,gameType,gameBuyInType,timestamp,tableId,tableName,maxBuyIn,minBuyIn,tourneyName,defaultBuyIn,bigBlind,smallBlind,players);

Other option is create separate array index and if both indexes qualifies for predicate query engine will do intersect scan.

is there a guide about intersect scan?
is there a risk like cartesian prouct?
i have trouble to create index.
it seems “players” is too long.

panic: Secondary key is too long (> 4096)

goroutine 261 [running]:
panic(0xe9c1e0, 0xc8201219e0)
        /home/couchbase/.cbdepscache/exploded/x86_64/go-1.6/go/src/runtime/panic.go:464 +0x3e6 fp=0xc823531890 sp=0xc823531810
github.com/couchbase/indexing/secondary/common.CrashOnError(0x7f558343c028, 0xc8201219e0)
        /home/couchbase/jenkins/workspace/watson-unix/goproj/src/github.com/couchbase/indexing/secondary/common/util.go:367 +0x4b fp=0xc8235318b8 sp=0xc823531890
github.com/couchbase/indexing/secondary/indexer.(*fdbSlice).checkFatalDbError(0xc82013e780, 0x7f558343c028, 0xc8201219e0)
        /home/couchbase/jenkins/workspace/watson-unix/goproj/src/github.com/couchbase/indexing/secondary/indexer/forestdb_slice_writer.go:810 +0x2f fp=0xc823531908 sp=0xc8235318b8
github.com/couchbase/indexing/secondary/indexer.(*fdbSlice).insertSecArrayIndex(0xc82013e780, 0xc8254b5500, 0x13ed, 0x1500, 0xc823449500, 0x13d8, 0x13d8, 0xc82352ae00, 0x13, 0x13, ...)
        /home/couchbase/jenkins/workspace/watson-unix/goproj/src/github.com/couchbase/indexing/secondary/indexer/forestdb_slice_writer.go:567 +0x1add fp=0xc823531cd0 sp=0xc823531908
github.com/couchbase/indexing/secondary/indexer.(*fdbSlice).insert(0xc82013e780, 0xc8254b5500, 0x13ed, 0x1500, 0xc823449500, 0x13d8, 0x13d8, 0xc82352ae00, 0x13, 0x13, ...)
        /home/couchbase/jenkins/workspace/watson-unix/goproj/src/github.com/couchbase/indexing/secondary/indexer/forestdb_slice_writer.go:355 +0x14a fp=0xc823531d40 sp=0xc823531cd0
github.com/couchbase/indexing/secondary/indexer.(*fdbSlice).handleCommandsWorker(0xc82013e780, 0x0)
        /home/couchbase/jenkins/workspace/watson-unix/goproj/src/github.com/couchbase/indexing/secondary/indexer/forestdb_slice_writer.go:314 +0x4c6 fp=0xc823531f90 sp=0xc823531d40
runtime.goexit()
        /home/couchbase/.cbdepscache/exploded/x86_64/go-1.6/go/src/runtime/asm_amd64.s:1998 +0x1 fp=0xc823531f98 sp=0xc823531f90
created by github.com/couchbase/indexing/secondary/indexer.NewForestDBSlice
        /home/couchbase/jenkins/workspace/watson-unix/goproj/src/github.com/couchbase/indexing/secondary/indexer/forestdb_slice_writer.go:158 +0x190d

goroutine 1 [select]:
....


Slides 46, 47 have few details on IntersectScan.

Index key size is issue, try non covered index with
CREATE INDEX SESSION_LIST_IDX ON HandHistory((distinct (array (t.userName) fort in players end)),playType,gameType,gameBuyInType,timestamp);

1 Like

we decide to create new bucket for list.
thanks.

In addition to the slides form @vsr1, I just want to answer your question–IntersectScan has no risk of a cartesian product. It performs a set intersection.