Optimize query with JOIN

How many docs by this N1QL

SELECT 
SUM(CASE WHEN submission.status = 'request' OR submission.offer IS NOT MISSING THEN 1 ELSE 0 END) AS requested,
SUM(CASE WHEN submission.status = 'pending' AND (submission.offer IS MISSING OR (submission.offer IS NOT MISSING AND (submission.highValueTarget = true OR submission.offer.status='approved'))) THEN 1 ELSE 0 END) AS pending
FROM brandbassador_dev submission
WHERE submission.docType="Submission" 
AND submission.status IN ["approved", "rejected", "started", "ongoing", "pending"]

return and how long does this N1QL took?

Since it is a sum with no group by it will return only a single document. Here’s the info:

Elapsed: 2.13s	Execution: 2.13s	Result Count: 1	Result Size: 66

I’ve also ran a count on the documents targeted by this query:

SELECT COUNT(1)
FROM brandbassador_dev submission
WHERE submission.docType="Submission" 
AND submission.status IN ["approved", "rejected", "started", "ongoing", "pending"]

Which returned 17225 documents

How about remove all index and add index by

CREATE INDEX `idx_missionId_status_offer_highValueTarget_4_submission_dev` ON `brandbassador_dev`(`status`,'Mission::' || `missionId`,IFMISSINGORNULL(`offer`,999),highValueTarget) WHERE (`docType` = "Submission") ;

And run the following N1QL

SELECT SUM(CASE WHEN submission.status = 'request' OR IFMISSINGORNULL(submission.offer,999) != 999 THEN 1 ELSE 0 END) AS requested,
       SUM(CASE WHEN submission.status = 'pending' AND (IFMISSINGORNULL(submission.offer,999) == 999 OR (IFMISSINGORNULL(submission.offer,999) != 999 AND (submission.highValueTarget = true OR IFMISSINGORNULL(submission.offer,999).status='approved'))) THEN 1 ELSE 0 END) AS pending,
       mission.docId,mission.*
  FROM brandbassador_dev submission
  JOIN brandbassador_dev mission ON KEYS ('Mission::' || submission.missionId)
 WHERE submission.docType="Submission" 
   AND submission.status IN ["approved", "rejected", "started", "ongoing", "pending"]
   AND mission.status IN ["published", "ended"] 
 GROUP BY mission 
 ORDER BY mission.created_at DESC 
 LIMIT 10

it should use cover index scan.

It’s still around 8 seconds… Here’s the EXPLAIN:

[
  {
    "plan": {
      "#operator": "Sequence",
      "~children": [
        {
          "#operator": "Sequence",
          "~children": [
            {
              "#operator": "IndexScan",
              "covers": [
                "cover ((`submission`.`status`))",
                "cover ((\"Mission::\" || (`submission`.`missionId`)))",
                "cover (ifmissingornull((`submission`.`offer`), 999))",
                "cover ((`submission`.`highValueTarget`))",
                "cover ((meta(`submission`).`id`))"
              ],
              "filter_covers": {
                "cover ((`submission`.`docType`))": "Submission"
              },
              "index": "idx_missionId_status_offer_highValueTarget_4_submission_dev",
              "index_id": "e92c9f62597c48ca",
              "keyspace": "brandbassador_dev",
              "namespace": "default",
              "spans": [
                {
                  "Range": {
                    "High": [
                      "successor(\"approved\")"
                    ],
                    "Inclusion": 1,
                    "Low": [
                      "\"approved\""
                    ]
                  }
                },
                {
                  "Range": {
                    "High": [
                      "successor(\"ongoing\")"
                    ],
                    "Inclusion": 1,
                    "Low": [
                      "\"ongoing\""
                    ]
                  }
                },
                {
                  "Range": {
                    "High": [
                      "successor(\"pending\")"
                    ],
                    "Inclusion": 1,
                    "Low": [
                      "\"pending\""
                    ]
                  }
                },
                {
                  "Range": {
                    "High": [
                      "successor(\"rejected\")"
                    ],
                    "Inclusion": 1,
                    "Low": [
                      "\"rejected\""
                    ]
                  }
                },
                {
                  "Range": {
                    "High": [
                      "successor(\"started\")"
                    ],
                    "Inclusion": 1,
                    "Low": [
                      "\"started\""
                    ]
                  }
                }
              ],
              "using": "gsi"
            },
            {
              "#operator": "Parallel",
              "~child": {
                "#operator": "Sequence",
                "~children": [
                  {
                    "#operator": "Join",
                    "as": "mission",
                    "keyspace": "brandbassador_dev",
                    "namespace": "default",
                    "on_keys": "cover ((\"Mission::\" || (`submission`.`missionId`)))"
                  },
                  {
                    "#operator": "Filter",
                    "condition": "(((cover ((`submission`.`docType`)) = \"Submission\") and (cover ((`submission`.`status`)) in [\"approved\", \"rejected\", \"started\", \"ongoing\", \"pending\"])) and ((`mission`.`status`) in [\"published\", \"ended\"]))"
                  },
                  {
                    "#operator": "InitialGroup",
                    "aggregates": [
                      "sum(case when ((cover ((`submission`.`status`)) = \"pending\") and ((cover (ifmissingornull((`submission`.`offer`), 999)) = 999) or ((not (cover (ifmissingornull((`submission`.`offer`), 999)) = 999)) and ((cover ((`submission`.`highValueTarget`)) = true) or ((cover (ifmissingornull((`submission`.`offer`), 999)).`status`) = \"approved\"))))) then 1 else 0 end)",
                      "sum(case when ((cover ((`submission`.`status`)) = \"request\") or (not (cover (ifmissingornull((`submission`.`offer`), 999)) = 999))) then 1 else 0 end)"
                    ],
                    "group_keys": [
                      "`mission`"
                    ]
                  }
                ]
              }
            },
            {
              "#operator": "IntermediateGroup",
              "aggregates": [
                "sum(case when ((cover ((`submission`.`status`)) = \"pending\") and ((cover (ifmissingornull((`submission`.`offer`), 999)) = 999) or ((not (cover (ifmissingornull((`submission`.`offer`), 999)) = 999)) and ((cover ((`submission`.`highValueTarget`)) = true) or ((cover (ifmissingornull((`submission`.`offer`), 999)).`status`) = \"approved\"))))) then 1 else 0 end)",
                "sum(case when ((cover ((`submission`.`status`)) = \"request\") or (not (cover (ifmissingornull((`submission`.`offer`), 999)) = 999))) then 1 else 0 end)"
              ],
              "group_keys": [
                "`mission`"
              ]
            },
            {
              "#operator": "FinalGroup",
              "aggregates": [
                "sum(case when ((cover ((`submission`.`status`)) = \"pending\") and ((cover (ifmissingornull((`submission`.`offer`), 999)) = 999) or ((not (cover (ifmissingornull((`submission`.`offer`), 999)) = 999)) and ((cover ((`submission`.`highValueTarget`)) = true) or ((cover (ifmissingornull((`submission`.`offer`), 999)).`status`) = \"approved\"))))) then 1 else 0 end)",
                "sum(case when ((cover ((`submission`.`status`)) = \"request\") or (not (cover (ifmissingornull((`submission`.`offer`), 999)) = 999))) then 1 else 0 end)"
              ],
              "group_keys": [
                "`mission`"
              ]
            },
            {
              "#operator": "Parallel",
              "~child": {
                "#operator": "Sequence",
                "~children": [
                  {
                    "#operator": "InitialProject",
                    "result_terms": [
                      {
                        "as": "requested",
                        "expr": "sum(case when ((cover ((`submission`.`status`)) = \"request\") or (not (cover (ifmissingornull((`submission`.`offer`), 999)) = 999))) then 1 else 0 end)"
                      },
                      {
                        "as": "pending",
                        "expr": "sum(case when ((cover ((`submission`.`status`)) = \"pending\") and ((cover (ifmissingornull((`submission`.`offer`), 999)) = 999) or ((not (cover (ifmissingornull((`submission`.`offer`), 999)) = 999)) and ((cover ((`submission`.`highValueTarget`)) = true) or ((cover (ifmissingornull((`submission`.`offer`), 999)).`status`) = \"approved\"))))) then 1 else 0 end)"
                      },
                      {
                        "expr": "(`mission`.`docId`)"
                      },
                      {
                        "expr": "`mission`",
                        "star": true
                      }
                    ]
                  }
                ]
              }
            }
          ]
        },
        {
          "#operator": "Order",
          "limit": "10",
          "sort_terms": [
            {
              "desc": true,
              "expr": "(`mission`.`created_at`)"
            }
          ]
        },
        {
          "#operator": "Limit",
          "expr": "10"
        },
        {
          "#operator": "FinalProject"
        }
      ]
    },
    "text": "SELECT SUM(CASE WHEN submission.status = 'request' OR IFMISSINGORNULL(submission.offer,999) != 999 THEN 1 ELSE 0 END) AS requested,\n       SUM(CASE WHEN submission.status = 'pending' AND (IFMISSINGORNULL(submission.offer,999) == 999 OR (IFMISSINGORNULL(submission.offer,999) != 999 AND (submission.highValueTarget = true OR IFMISSINGORNULL(submission.offer,999).status='approved'))) THEN 1 ELSE 0 END) AS pending,\n       mission.docId,mission.*\n  FROM brandbassador_dev submission\n  JOIN brandbassador_dev mission ON KEYS ('Mission::' || submission.missionId)\n WHERE submission.docType=\"Submission\" \n   AND submission.status IN [\"approved\", \"rejected\", \"started\", \"ongoing\", \"pending\"]\n   AND mission.status IN [\"published\", \"ended\"] \n GROUP BY mission \n ORDER BY mission.created_at DESC \n LIMIT 10"
  }
]

How long does this N1QL took

SELECT SUM(CASE WHEN submission.status = 'request' OR IFMISSINGORNULL(submission.offer,999) != 999 THEN 1 ELSE 0 END) AS requested,
       SUM(CASE WHEN submission.status = 'pending' AND (IFMISSINGORNULL(submission.offer,999) == 999 OR (IFMISSINGORNULL(submission.offer,999) != 999 AND (submission.highValueTarget = true OR IFMISSINGORNULL(submission.offer,999).status='approved'))) THEN 1 ELSE 0 END) AS pending
  FROM brandbassador_dev submission
 WHERE submission.docType="Submission" 
   AND submission.status IN ["approved", "rejected", "started", "ongoing", "pending"]

it still took 2.13s?
may be you can open debug log and find which part of query took long time.

Running that query takes around 1.5s now.

I never used the debug log, can you help me find it please?

by the following steps:

  1. get the config info
curl  -u Administrator:<password> http://127.0.0.1:8093/admin/settings
  1. open debug info by change
"debug”:true "loglevel":"DEBUG"
  1. optimizing query by change
"max-parallelism”:2  "pipeline-batch":256
  1. for example,change config by
curl -XPOST -u Administrator:<password> http://127.0.0.1:8093/admin/settings -d '{"completed-limit":4000,"completed-threshold":1000,"cpuprofile":"","debug":true,"keep-alive-length":16384,"loglevel":"DEBUG","max-parallelism":2,"memprofile":"","pipeline-batch":256,"pipeline-cap":512,"request-size-cap":67108864,"scan-cap":0,"servicers":8,"timeout":0}' 

then you can check the log under

<couchbase_server_install_dir>/var/lib/couchbase/logs/query.log
```to find which part of query took long time

Sorry for the delay @atom_yang.

In attachment you’ll find the log you’ve requested query.zip (1.4 MB)

I am not really sure for what to look there… Let me know if you need more info please!

you should check Phase aggregates of your N1QL,for example in your log line 6895

_time=2017-03-13T20:16:00.564+00:00 _level=TRACE _msg=Phase aggregates parse=6.242079ms instantiate=141.264µs authorize=107.977µs fetch=2.180234797s _id=28e8b7fb-afd1-4148-bff0-ef0bf4790f42 scan=1.210844273s run=1.456280452s plan=22.197795ms 

fetch took more than 2s and scan took more than 1s,so you need optimize by decrease the fetch time and scan time.
and try to check the Explain to ensure all your index be used correctly.

and may be you can post your query service config by

curl  -u Administrator:<password> http://127.0.0.1:8093/admin/settings

to check if there are some parameters can be optimized.

Maybe I am wrong, but it seems that there isn’t much that indices can help in this scenario… If I have for instance 15k documents indexed by docType and status - being these are the fields I use in the WHERE clause - and it still takes ~2 seconds to retrieve them, how can we further optimize this use case with indices?

You mention that I need to decrease fetch and scan time - if the index is already the best it can be, how can achieve this? Do I need to increase the resources on my bucket/nodes? The EXPLAIN query seems to show that the index is being used correctly, as shown in other posts in this conversation.

This is the current configuration for the query service:

{
   "completed-limit":4000,
   "completed-threshold":1000,
   "cpuprofile":"",
   "debug":true,
   "keep-alive-length":16384,
   "loglevel":"DEBUG",
   "max-parallelism":2,
   "memprofile":"",
   "pipeline-batch":256,
   "pipeline-cap":512,
   "request-size-cap":67108864,
   "scan-cap":0,
   "servicers":8,
   "timeout":0
}

Let me know if there’s any more information I can provide to solve this problem. Thank you for your help!

can you post your cb server deploy architecture/server Configuration/documents numbers/Server Resources of index ?
cc @geraldss @prasad

Hello,

Jumping in on this conversation.

We have 3 Virtual Machines for this cluster, running on Google Cloud Platform.

Each machine runs on centos-release-6-8.el6.centos.12.3.x86_64 and has:
4 vCPUs
8GB RAM
60GB HDD (system/Couchbase install location)
100GB HDD (Couchbase data)

HDDs throughput is shown on linked folder’s CB_05.jpg

This cluster runs 3 “equal” buckets: (see linked folder’s CB_03.jpg for item count)

  • 1 for production,
  • 1 for staging,
  • 1 for development

The production bucket has 9GB RAM allotted to it. _dev and _stg have 3GB each.

— EDIT

Quick note: Just realized that when I took the screenshot the _stg bucket was being rebuilt. The normal number of documents is around 15k as the other ones.
— END EDIT

For cluster configuration, see linked folder’s CB_04.jpg.
For cluster nodes configuration, see linked folder’s CB_02.jpg

Current cluster usage stats: see linked folder’s CB_01.jpg

Please notice,
The tests @manusyone mentioned were done on the _stg bucket.

For said linked folder:

  1. AS your linked CB_04.jpg shows that you are using GSI, and you have enough MEM,so may be you can use MOI to speed up.
  2. you can create index by WITH {"nodes": ["10.132.0.10:8091"]} to deploy index and query on 10.132.0.15.

Hi @atom_yang, thanks for the quick reply.

About point 1, I believe that MOI is only available for the Enterprise edition - has something changed regarding this?

About point 2, do you mean “create index by WITH {“nodes”: [“10.132.0.10:8091”]} to deploy index and query on 10.132.0.10.”? Could you explain how deploying the query and index could improve in this scenario please?

GSI need Disk/Memory Intensive,Average CPU Utilization
And Index Service deploy with Query service is better compared to Data Service.
The aim to keep index data in memory.

FYI
In 06_WS_Index_Manageability.

Hey @atom_yang. I have created the following index as you proposed (different use case):

CREATE INDEX idx_brandbassador_test_submission ON brandbassador(docType,status,userId) 
WHERE (docType = "Submission") 
WITH {"nodes": "10.132.0.10:8091"} ;

I ran the following query:

SELECT COUNT(1) as totalPosts, users.docId
FROM brandbassador submissions
JOIN brandbassador users ON KEYS 'User::' || submissions.userId
WHERE submissions.docType = 'Submission'
AND users.docType = 'User'
AND submissions.status IN ["approved", "started", "pending", "ongoing"]
GROUP BY users.docId

which takes between 3-6 seconds.

EXPLAIN statement shows the following:

[
  {
    "plan": {
      "#operator": "Sequence",
      "~children": [
        {
          "#operator": "IndexScan",
          "covers": [
            "cover ((`submissions`.`docType`))",
            "cover ((`submissions`.`status`))",
            "cover ((`submissions`.`userId`))",
            "cover ((meta(`submissions`).`id`))"
          ],
          "filter_covers": {
            "cover ((`submissions`.`docType`))": "Submission"
          },
          "index": "idx_brandbassador_test_submission",
          "index_id": "61aa39cc50942713",
          "keyspace": "brandbassador",
          "namespace": "default",
          "spans": [
            {
              "Range": {
                "High": [
                  "\"Submission\"",
                  "successor(\"approved\")"
                ],
                "Inclusion": 1,
                "Low": [
                  "\"Submission\"",
                  "\"approved\""
                ]
              }
            },
            {
              "Range": {
                "High": [
                  "\"Submission\"",
                  "successor(\"ongoing\")"
                ],
                "Inclusion": 1,
                "Low": [
                  "\"Submission\"",
                  "\"ongoing\""
                ]
              }
            },
            {
              "Range": {
                "High": [
                  "\"Submission\"",
                  "successor(\"pending\")"
                ],
                "Inclusion": 1,
                "Low": [
                  "\"Submission\"",
                  "\"pending\""
                ]
              }
            },
            {
              "Range": {
                "High": [
                  "\"Submission\"",
                  "successor(\"started\")"
                ],
                "Inclusion": 1,
                "Low": [
                  "\"Submission\"",
                  "\"started\""
                ]
              }
            }
          ],
          "using": "gsi"
        },
        {
          "#operator": "Parallel",
          "~child": {
            "#operator": "Sequence",
            "~children": [
              {
                "#operator": "Join",
                "as": "users",
                "keyspace": "brandbassador",
                "namespace": "default",
                "on_keys": "(\"User::\" || cover ((`submissions`.`userId`)))"
              },
              {
                "#operator": "Filter",
                "condition": "(((cover ((`submissions`.`docType`)) = \"Submission\") and ((`users`.`docType`) = \"User\")) and (cover ((`submissions`.`status`)) in [\"approved\", \"started\", \"pending\", \"ongoing\"]))"
              },
              {
                "#operator": "InitialGroup",
                "aggregates": [
                  "count(1)"
                ],
                "group_keys": [
                  "(`users`.`docId`)"
                ]
              }
            ]
          }
        },
        {
          "#operator": "IntermediateGroup",
          "aggregates": [
            "count(1)"
          ],
          "group_keys": [
            "(`users`.`docId`)"
          ]
        },
        {
          "#operator": "FinalGroup",
          "aggregates": [
            "count(1)"
          ],
          "group_keys": [
            "(`users`.`docId`)"
          ]
        },
        {
          "#operator": "Parallel",
          "~child": {
            "#operator": "Sequence",
            "~children": [
              {
                "#operator": "InitialProject",
                "result_terms": [
                  {
                    "as": "totalPosts",
                    "expr": "count(1)"
                  },
                  {
                    "expr": "(`users`.`docId`)"
                  }
                ]
              },
              {
                "#operator": "FinalProject"
              }
            ]
          }
        }
      ]
    },
    "text": "SELECT COUNT(1) as totalPosts, users.docId\nFROM brandbassador submissions\nJOIN brandbassador users ON KEYS 'User::' || submissions.userId\nWHERE submissions.docType = 'Submission'\nAND users.docType = 'User'\nAND submissions.status IN [\"approved\", \"started\", \"pending\", \"ongoing\"]\nGROUP BY users.docId"
  }
] 

We did not notice any major improvements with this change. Any suggestions? Thank you for your help!

Try the following query.

SELECT COUNT(1) as totalPosts, users.docId FROM
          (SELECT RAW 'User::' || userId FROM  brandbassador WHERE docType = 'Submission'
                       AND status IN ["approved", "started", "pending", "ongoing"]) AS q1
JOIN brandbassador users ON KEYS q1
WHERE users.docType = 'User'
GROUP BY users.docId;

As queries using Joins, Try original queries on 4.6.0

Hello @vsr1! The query you provided has similar execution times to the first one, so no noticeable improvements (yesterday the system was under high load, now both queries run between 1.5-3s).

I guess there must be something we’re not doing correctly. With a fairly low cardinality of documents (in the tens of thousands), should we expect a query to take this long with proper indices?

As for your last comment, I am not sure what you mean by “original queries”, could you elaborate? Also, the 4.6.0 is a Developer Preview (Community Edition), meaning that may not be possible to upgrade at this time. Thank you for your help, looking forward for your feedback!

Hello @manusyone, The left side of JOIN produces 17225 items (from post 5). Left side uses covered index but not right side of Join. It needs to fetch almost 17225 documents and it taking time due to nested loop join.

I meant “original queries” are the query in post 19. In 4.6.x we have block nested loop join.

I see. We’ll be trying 4.6.x to check if we can improve query performance. Thank you for your help!