CPU Usage Too High

Previously I ran with Couchbase Server 4.5 and Sync Gateway 1.4.1, and it worked well.

I updated all to Couchbase Server 5.0 and Sync Gateway 1.5.1 recently, and I got a some trouble.

Two Couchbase Servers show CPU 5%~15%, RAM 50%, OPs per second 1K~3K.

But Sync Gateway servers reach CPU 100% within a minute.
And I don’t know what should I do.

Sync Gateways seem to process request well.

2018-04-06T09:16:46.499Z HTTP: #4410: POST /mybucket/_changes (as 32683737-cf7b-440a-bfcf-b4f656b9c656)
2018-04-06T09:16:46.499Z HTTP: #4414: POST /mybucket/_bulk_docs (as 25caa0b2-cfff-4e98-89be-008245df907a)
2018-04-06T09:16:46.499Z HTTP: #4411: POST /mybucket/_changes?feed=longpoll&heartbeat=45083&style=all_docs&since=683687687 (as 0a97a0e4-b733-4c86-82e4-3cf11710391b)
2018-04-06T09:16:46.499Z HTTP+: #4407: → 201 (86.3 ms)
2018-04-06T09:16:46.501Z HTTP+: #4102: → 201 (5282.6 ms)
2018-04-06T09:16:46.502Z HTTP: #4412: POST /mybucket/_bulk_docs (as 2d1bc4fe-60c9-49ed-958e-99e7110c9ac4)
2018-04-06T09:16:46.525Z HTTP+: #4405: → 201 (130.9 ms)
2018-04-06T09:16:46.527Z HTTP+: #4416: → 200 (32.9 ms)
2018-04-06T09:16:46.527Z HTTP: #4413: POST /mybucket/_revs_diff (as 31486d86-1a12-4f4d-9fe5-60742f7b9818)
2018-04-06T09:16:46.552Z HTTP: #4415: POST /mybucket/_bulk_docs (as 8a210afb-4db6-48db-a6a4-2c59afc0f67f)
2018-04-06T09:16:46.553Z HTTP+: #3445: → 200 (17685.0 ms)
2018-04-06T09:16:46.565Z HTTP+: #4398: → 200 OK (0.0 ms)
2018-04-06T09:16:46.572Z HTTP: #4417: PUT /mybucket/_local/df5047454d2f87206bcf8802cda3b265f01559d7 (as 83c16f09-f446-4c3c-b786-74ffc654b151)
2018-04-06T09:16:46.588Z HTTP: #4418: POST /mybucket/_bulk_docs (as 2d1bc4fe-60c9-49ed-958e-99e7110c9ac4)
2018-04-06T09:16:46.607Z HTTP+: #4417: → 201 (72.2 ms)
2018-04-06T09:16:46.607Z HTTP: #4419: PUT /mybucket/_local/0e08a0fcc6fd01e90c0405d94d06e82b5a1e1f39 (as 3a32c428-3b68-4869-bb8a-57ac3e6e8715)
2018-04-06T09:16:46.616Z HTTP+: #4403: → 200 OK (0.0 ms)
2018-04-06T09:16:46.646Z HTTP+: #4367: → 200 OK (0.0 ms)
2018-04-06T09:16:46.758Z HTTP: #4432: POST /mybucket/_changes (as 4510cc5f-d2c0-4f6c-9d7c-f5224ef9b152)
2018-04-06T09:16:46.758Z HTTP: #4435: PUT /mybucket/_local/39dcee8d74a20cf8daf9cd94c8f40dc70256bf66 (as 41d328d3-9168-4ccf-a43a-394c0d36804d)
2018-04-06T09:16:46.758Z HTTP+: #4432: → 200 OK (0.0 ms)
2018-04-06T09:16:46.768Z HTTP+: #311: → 200 OK (0.0 ms)

But times go, there are some strange behaviors.
So I killed sync gateway processed per two minutes now.

Some doubtful logs are

2018-04-06T06:25:24.182Z WARNING: Error returned when releasing sequence 683399154. Falling back to skipped sequence handling. Error:operation has timed out – db.(*Database).updateAndReturnDoc() at crud.go:1044
2018-04-06T06:25:24.183Z WARNING: backupAncestorRevs failed: doc=“mydocmydoc” rev=“1-7a2ba9a9945b13ce12b03db2fa5d309d” err=operation has timed out – db.(*Database).backupAncestorRevs() at crud.go:520
2018-04-06T06:25:24.314Z WARNING: backupAncestorRevs failed: doc=“local:mydocmydoc” rev=“10-076f71922c819cc9d3525de9420cc7a6” err=operation has timed out – db.(*Database).backupAncestorRevs() at crud.go:520
2018-04-06T09:20:29.081Z changes_view: Query took 216.544767ms to return 33 rows, options = db.Body{“startkey”:interface {}{“7ce0f464-c981-4199-9632-0205b61f1cfc”, 0x1}, “endkey”:interface {}{“7ce0f464-c981-4199-9632-0205b61f1cfc”, 0x28c0622f}, “stale”:false}

And I lost some documents(about 4%?) while upgrade Couchbase Server 5.0 from 4.5 by mistake.

Another change is that reverse proxy is changes from Nginx to AWS Load Balancer. (for scaling)

Can you post your Sync Gateway configuration? Make sure to delete any sensitive information like passwords or IP addresses.

And I lost some documents(about 4%?) while upgrade Couchbase Server 5.0 from 4.5 by mistake.

Can you provide more details?

Actually I downgraded to Couchbase 4.5 & Sync Gateway 1.4. So I don’t have exact setting remain.

But it was like this.

{
  "log": ["HTTP+"],
  "adminInterface": "0.0.0.0:4985",
  "interface": "0.0.0.0:4984",
  "maxFileDescriptors": 250000,
  "databases": {
    "mybucket": {
      "allow_empty_password": true,
      "server": "couchbase://10.0.123.123",
      "username": "user",
      "password": "password",
      "bucket": "mybucket",
      "users": { "GUEST": {"disabled": true, "admin_channels": [] } }
    }
  }
}

When I upgraded cluster, I added one Couchbase Server 5.0 node to the live cluster of two 4.5 nodes.
And something was not going well, I removed 5.0 node.
But the cluster was still doing rebalance, and two 4.5 nodes had different number of documents.
(I guess due to the replication setting, new documents was not distributed well)
So I removed one 4.5 node with failover ( I didn’t understand Couchbase Server concept then )
Finally, I copied remain documents to new cluster of one 5.0 node with cbtransfer.
But cbtransfer stopped at 96%
So I guess I lost some documents.

But I don’t think documents loss is not the cause of CPU 100%.

@sixmen how many docs in your database? If SG is importing a large number, CPU can be high during that initial processing.

Max file descriptors of 250000 seems quite excessive to me. Could that be a problem @traun?

My data bucket has 100M items.

For max file descripors setting, I refered https://developer.couchbase.com/documentation/mobile/1.5/guides/sync-gateway/os-level-tuning/index.html

Avarage HTTP requests per minute is 6000~700 (peak above 15000).

Does SG itself needs initial processing? I might wait more than hours for my memory.
After that, I killed SG processes every two minutes before I downgraded.

I don’t have good numbers on time. With CBS 5.0 and SG 1.5 there will be processing to use XATTRS for all the sync information. Hours seems like plenty for initializing, but I think 100M docs will take quite a while. So it may just not have run long enough.

It would be a lot faster if you don’t need all the docs. You can set up filters for SG. You might look at using that.

Once the initial setup has gone through, restarting should be fairly quick.

Can you test with a smaller data set/setup?

I found xattrs information after I got a problem.

Does SG use XATTRs even if I don’t specify ‘enable_shared_bucket_access=true’?
And is it consume SG’s CPU, not CBS’s CPU?

To upgrade, which should I upgrade first?
CBS 4.5 to 5.0 -> add SG 2.0 gateway, and wait sync -> remove 1.4 ??

No. You will have to specify that config flag to enable XAttrs. There is also an import_docs flag that you need to set up.

And this s is the import_filter function to set up. As Hod, pointed out, you may want to re-visit what documents you want to process via the SGW . If you don’t expect to sync all the documents to mobile clients, then filter those from processing.

Didn’t quite follow your question but XAttrs are added by SGW as it processes the documents that it imports. Check out blog on the using shared bucket access .

You may want to check out this upgrade guide on upgrading from pre-XAttr version of SGW.