Constant High CPU usage after upgrading from 3.0.3 to 5.1.1 Community Edition

Hello,

We are using a single-server, single-bucket memcached couchbase server instance running on Windows Server 2012 R2 as caching solution of our system for a while. We were running CouchBase version 3.0.3 until last week without issues. Then we had some connectivity issues after installing latest Windows updates and we observed that those updates can run fine with 5.1.1 so we decided to upgrade to 5.1.1.

Before the upgrade the VM was working with stable CPU 10-20% CPU utilization. After the upgrade it started hitting 100%. As an initial workaround we increased the cores of the VM from 4 to 8 which granted some margin of free CPU but still it was around 70-80%.

We are observing the CPU usage on that machine for almost a week and CPU usage never ever gets below 50-60%, sometimes hitting 90%. Below is the PRTG graph that shows CPU usage for the last 30 days. You can clearly see how it peaked after upgrade and settled down a bit after increasing cores.

I’ll attach the memcached task resource usage image to the second post as I’m not allowed to post 2 images in a post as a new user :slight_smile:

I’d appreciate if you can assist on how to troubleshoot the issue.

Below is the image which shows resource usage by memcached process

memcachedresourceusage

What workload (ops/, number of docs etc) are you running? Any change since the upgrade?

What services do you have configured?

Could you grab a thread dump from the memcached process?

Hello @drigby,

There is no change in the load before and after deployment. We are using CouchBase as 2nd level cache solution and we examined the immediate peak in CPU usage just after updating it to 5.1.1

We do not have specific services like XDCR, full text search nodes, GSI indexes etc. It is just a single memcache bucket.

Below is the cbstats output:

accepting_conns:                1
auth_cmds:                      112409
auth_errors:                    0
bytes:                          1510156837
bytes_read:                     164674859338
bytes_subdoc_lookup_extracted:  0
bytes_subdoc_lookup_total:      0
bytes_subdoc_mutation_inserted: 0
bytes_subdoc_mutation_total:    0
bytes_written:                  853286775752
cas_badval:                     0
cas_hits:                       0
cas_misses:                     0
cmd_flush:                      0
cmd_get:                        101781189
cmd_lock:                       0
cmd_lookup_10s_count:           635
cmd_lookup_10s_duration_us:     5452
cmd_mutation_10s_count:         4058
cmd_mutation_10s_duration_us:   69740
cmd_set:                        1269494611
cmd_subdoc_lookup:              0
cmd_subdoc_mutation:            0
cmd_total_gets:                 101781189
cmd_total_ops:                  1371276902
cmd_total_sets:                 1269495713
conn_yields:                    0
connection_structures:          24211
curr_connections:               24211
curr_conns_on_port_11209:       8
curr_conns_on_port_11210:       24203
curr_items:                     4136634
daemon_connections:             4
decr_hits:                      551
decr_misses:                    0
delete_hits:                    0
delete_misses:                  0
engine_maxbytes:                15702425600
evictions:                      0
get_hits:                       61823359
get_misses:                     39957830
incr_hits:                      551
incr_misses:                    0
iovused_high_watermark:         3
libevent:                       2.1.8-beta
listen_disabled_num:            0
lock_errors:                    0
max_conns_on_port_11209:        5000
max_conns_on_port_11210:        30000
memcached_version:              bbb1bc7f041fbbc19b2b5279a8856fae2d585554
msgused_high_watermark:         1
pid:                            3740
pointer_size:                   64
rbufs_allocated:                0
rbufs_existing:                 1374083682
rbufs_loaned:                   0
reclaimed:                      200157
rejected_conns:                 0
stat_reset:                     Thu Sep 27 10:00:44 2018
threads:                        6
time:                           1538476808
total_connections:              112902
total_items:                    1269494450
total_resp_errors:              42734018
uptime:                         434364
version:                        5.1.1-5723
wbufs_allocated:                0
wbufs_loaned:                   0

Thanks for the info.

I don’t believe I’ve seen anyone report this increase in resource usage between 3.0.3 & 5.1.1 on Windows; certainly most users are not using memcache-type buckets exclusively. It’s possible there’s some resource usage regression between those versions in your specific configuration.

Would you be able to get a per-thread CPU breakdown using Perfmon or similar, so we can see which thread(s) are busy?

Hello @drigby,

I think below image is what you have asked.

BTW, due to complaints from our clients, we reverted CouchBase to version 3.1.3 and CPU usage went down back to 10-20%. But of course it is not the final solution. We still need to identify how to run the latest version with acceptable performance.

How was this resolved?