Performance Basic Query Update

Hi,

Failing to understand performance output (v6.0.2).

Basic Scenario:

  • 1M documents, just KEY + Value in one bucket
  • 12 cores machine (with HT 24vCPU): DATA, QUERY, INDEX (256GB ram, 100% resident)

Config
One Primary Index as it will be a full scan.
(adding an index in the value doesn’t change the time result, as expected)

Query:
update MyTestBucket set myValue=50

Results
Parallelism 1: 60s
Parallelism 4: 40s
Parallelism 24: 20s

Questions
Cannot understand why KernTime for PrimaryScan is 41s. What is it doing? and how come Fetch has another 30s KernTime. How this KernTime is measure? I cannot see how the sum up off all these times is 5 times bigger than the total time of the query.
I am clueless why such as simple query takes that massive amount of time. It is only 1M simple documents

Plan

{
  "#operator": "Sequence",
  "#stats": {
    "#phaseSwitches": 1,
    "execTime": "4.178µs"
  },
  "~children": [
    {
      "#operator": "Authorize",
      "#stats": {
        "#phaseSwitches": 3,
        "execTime": "11.988µs",
        "servTime": "967.205µs"
      },
      "privileges": {
        "List": [
          {
            "Target": "default:MyTestBucket",
            "Priv": 8
          }
        ]
      },
      "~child": {
        "#operator": "Sequence",
        "#stats": {
          "#phaseSwitches": 1,
          "execTime": "8.013µs"
        },
        "~children": [
          {
            "#operator": "PrimaryScan3",
            "#stats": {
              "#itemsOut": 422917,
              "#phaseSwitches": 1691671,
              "execTime": "1.01515339s",
              "kernTime": "41.834858434s",
              "servTime": "293.844824ms"
            },
            "index": "MyTestBucket-idx",
            "index_projection": {
              "primary_key": true
            },
            "keyspace": "MyTestBucket",
            "namespace": "default",
            "using": "gsi",
            "#time_normal": "00:01.3089",
            "#time_absolute": 1.308998214
          },
          {
            "#operator": "Sequence",
            "#stats": {
              "#phaseSwitches": 1,
              "execTime": "19.388µs"
            },
            "~children": [
              {
                "#operator": "Fetch",
                "#stats": {
                  "#itemsIn": 422917,
                  "#itemsOut": 422917,
                  "#phaseSwitches": 1744537,
                  "execTime": "1.451948471s",
                  "kernTime": "30.536174622s",
                  "servTime": "11.209811943s"
                },
                "keyspace": "MyTestBucket",
                "namespace": "default",
                "#time_normal": "00:12.6617",
                "#time_absolute": 12.661760414
              },
              {
                "#operator": "Clone",
                "#stats": {
                  "#itemsIn": 422917,
                  "#itemsOut": 422917,
                  "#phaseSwitches": 1691671,
                  "execTime": "3.646958513s",
                  "kernTime": "39.599537304s"
                },
                "#time_normal": "00:03.6469",
                "#time_absolute": 3.646958513
              },
              {
                "#operator": "Set",
                "#stats": {
                  "#itemsIn": 422917,
                  "#itemsOut": 422917,
                  "#phaseSwitches": 1691671,
                  "execTime": "705.135974ms",
                  "kernTime": "42.58877295s"
                },
                "set_terms": [
                  {
                    "path": "(`MyTestBucket`.`numberOfHits`)",
                    "value": "50"
                  }
                ],
                "#time_normal": "00:00.7051",
                "#time_absolute": 0.705135974
              },
              {
                "#operator": "SendUpdate",
                "#stats": {
                  "#itemsIn": 422917,
                  "#itemsOut": 422917,
                  "#phaseSwitches": 1744537,
                  "execTime": "1.087421138s",
                  "kernTime": "310.070779ms",
                  "servTime": "41.945761193s"
                },
                "alias": "MyTestBucket",
                "keyspace": "MyTestBucket",
                "namespace": "default",
                "#time_normal": "00:43.0331",
                "#time_absolute": 43.033182331000006
              }
            ],
            "#time_normal": "00:00.0000",
            "#time_absolute": 0.000019388
          },
          {
            "#operator": "Discard",
            "#stats": {
              "#itemsIn": 422917,
              "#phaseSwitches": 845837,
              "execTime": "47.909714ms",
              "kernTime": "43.295380593s"
            },
            "#time_normal": "00:00.0479",
            "#time_absolute": 0.047909714
          }
        ],
        "#time_normal": "00:00.0000",
        "#time_absolute": 0.000008013
      },
      "#time_normal": "00:00.0009",
      "#time_absolute": 0.0009791930000000002
    },
    {
      "#operator": "Stream",
      "#stats": {
        "#phaseSwitches": 3,
        "execTime": "4.162µs",
        "kernTime": "43.344287065s"
      },
      "#time_normal": "00:00.0000",
      "#time_absolute": 0.000004162
    }
  ],
  "~versions": [
    "2.0.0-N1QL",
    "6.0.2-2413-enterprise"
  ],
  "#time_normal": "00:00.0000",
  "#time_absolute": 0.000004178
}

You should avoid primary index. https://blog.couchbase.com/primary-uses-for-couchbase-primary-index/
N1QL UPDATE always Fetch the whole document from the Data Node and writes as whole document.

Even your machine has 24 CPU, CE version of query service will use 4 CPUs only. As you fetch 0.5M documents it is taking time.
All operators works in paralell that why when you add up individual times doesn’t match total time.
When you set Parallelism some operators run by that many times in parallel each works on different document.

Sorry for hijacking this thread. Could you link to a resource/documentation about this please?

https://docs.couchbase.com/server/4.0/architecture/services-archi-multi-dimensional-scaling.html
Check the end of the page

https://docs.couchbase.com/server/5.1/introduction/editions.html#ee-vs-ce High Performance Concurrent Query Execution

Thanks! I found the info. This is from CB Server 4.0. I wasn’t able to find this in CB Server 6.0. So I assumed there is no limit on how many cores are used by the query service. Is there still a (4 core per node) limit in CB Server 6.0? However I understand that there is no multidimensional scaling in CE.

4 core per query service node is still applicable for all releases in CE. In CE each node must have all services.

1 Like

Hi,

Thank you for the quick response:
We are using EE. so we can see 24 cores used (95% CPU usage)

Got it about the adding up time (Parallelism). That might explain why KernTime of one phase was like 10 times the whole time of the query.
For a Full Scan, what would be the improvement of using any other than the Primary Key? We don’t need to use Where closure here (Based on our business logic).

We were just surprise same operation, in NDB with one data node (same machine). takes for 1M. 13s to update.

N1QL update needs to fetch whole document and re-write the document back. If the document size is big it can take time. Also mutation takes time.

One option will be Use SELECT query and get the document keys and use SDKs SUBDOC API to update the field . This avoids Fetch. Checkout Dotnet bulk subdocument API - #5 by lukeb

Thanks for the info.

It is still difficult to believe it could be faster to get the Keys at the backend, do the update of one field there and send it back to the server instead of request a SQL Update operation (No network traffic from the backend).
Actually only doing a “get document Keys” how will be update the document without getting the whole of it to be ableto updating it back (only one field change). It seems I need to spend some more time with the doc.

Now our objects has around 10 fields btw.

Found https://docs.couchbase.com/dotnet-sdk/2.7/subdocument-operations.html

We will give it a go.

It is my understanding we can thread the writing instead of using N1QL and maybe faster.

We believe we should move to the SDK and grab the data manually and do the calculations at the backend instead of waiting for Query Service.

We have given it a try using Python SDK with Views. So far we can get all the keys with the iterator but it takes too much time so our next approach is to try to thread the process but we couldn’t find how to properly paginate the query.

We define a view:
function (doc, meta) {
if (doc.docType && doc.docType == “AAAA”) {
emit(meta.id, null);
}
}

and do a query:
rows = bucket.query(‘stats’, ‘StatsView’)

but how could we query to get bucket of keys and run them in different threads?
We found of the skip+limit but according to the doc, and some users, skip will have a massive degradation on the performance.

As you might notice, we don’t know the keys so we need to get them first to do any operation to the doc (update one field based on other).

Thanks

PS: We were thinking to use the example: master/examples/connection-pool.py when we figure out how to split the work

I did a quick test and reading 1M record with the View Iterator from the bucket.query takes 28s.

I am sure if I can parallize this read it will get a nice boots in performance.
Couldn’t find how the RowProcessor handler request new rows, like if it could ask in advance for the next page (and thread it).

I read about the Skip but without knowing the number of records I cannot even try to measure that "performance impact of sKip).

I could do a SELECT COUNT… but…

I run a N1SQL query to get the size of the Bucket. then split the bucker.query view in several threads just for reading and storing in memory using skip-limit. An as expected, it is slower to get 10 threads than just one to read the whole bucket. 1M Records