Azure Performance Expectations

Hello -

We have a three-node cluster of Azure A3 VMs (4 cores, 7GB RAM). Our workloads are pretty light, except for a weekly data import process that imports around 100-200K records per week. The nodes are Ubuntu 12.04 and CB 3.0.1. We’re using the C# .net SDK to insert and update the records.

The initial (fresh) inserts are the fastest. We can insert new documents at around 500 docs/sec which displays in the admin console as 1000-1500 ops/sec (I’m guessing the replica write is counted in this stat?). Our documents are pretty small, with the average size being 2K. While this isn’t as fast as we’d hoped, it’s workable. Where the performance gets unworkable is in document updates.

Our import process sometimes does an update if an incoming document ID already exists in CB. It will load the document from CB into the C# model, compare it to the incoming document, apply the updates in memory, then rewrite the entire document back to CB. From what we read in the SDK docs, this is the only way to update an existing doc.Unfortunately this process is VERY slow. This process tends to write documents at about 30-40 docs per second.

We started our troubleshooting by doing a fairly deep review of our disk configuration. We’re currently using a RAID 0 software RAID (mdadm) of 8x Azure Page blob disks per the best “high-performance disk config” docs we could find for Linux on Azure.

500 docs/sec seems slow for a three-node cluster, but 30-40 seems REALLY slow. Does anyone else have any experience with Azure A3 VMs and blob storage disks? Definitely feels like there’s something we’re missing.

Thanks,

  • Jeff

Hi, Jeff,

It will be difficult to give an performance expectation because there can be many bottlenecks in the system. For example, RAM, disk I/O, network etc.

Depending on your updates pattern, you may be able to leverage append/prepend commands but I’m not sure.

On a side note, we are working on a new feature that will fix the updates issue.

Thanks,
Qi

Thanks Qi. I’ve seen a few blog posts around deploying CB on Azure, but they mainly focus on things we’re already doing (using blob data disks instead of the OS volume, using private networks, etc). Do you know of any forum or blog posts that talk about expected performance on Azure?

In terms of benchmarking performance in ops/sec, it seems like the only real variables would be VM size, average doc size, and number of nodes, right? This would allow you to come up some metrics around worst-case scenario when a read or write comes from disk instead of cache:

A-Tier (8x Disks) 2000 ops/sec/node
A-Tier (16x Disks) 4000 ops/sec/node
D-Tier (32x Disks) 8000 ops/sec/node
DS-Tier (DS3) 12,000 ops/sec/node
etc, etc, etc

Then people could derive their perf based on their average doc size. This seems reasonable since a non-cache read or write should ultimately be weighted by the IOPS performance of the underlying storage subsystem.

I guess I’m just surprised with the popularity of Azure that there isn’t more documentation and/or clarity on how to properly consume Azure VMs to stand up a Couchbase cluster.

Hi,Jeff,

Unfortunately, I’m not aware of any blog posts talking about expected performance on Azure.

In terms of performance, I think we need to consider the following factors: RAM, working set, disk I/O, doc size etc.

Here’s a good talk about how to tune things for CB:
http://www.couchbase.com/connect/agenda/tuning-couchbase-server-os-network-maximum-performance/

Hope it helps.

Thanks,
Qi

It also sounds slow to me given the details you’ve listed. I’d start by identifying where the bottleneck might lie - what GET and SET performance do you get with one of the standard workload generators - cbworkloadgen (ships with Couchbase server) or cbc pillowfight (part of the Couchbase C SDK). See how they compare to your application.

Also examine the output of cbstats timings - that will show you the times from the Couchbase server nodes’ point of view. See the timings documentation for more details.

If I had to make I guess, I’d say that your application is operating synchronously - i.e. waiting for one op to complete before starting the next one. If you’re seeing 500 docs/sec on the initial import that would equate to a 1/500 or 2ms per-operation round-trip time if all operations are serialized.

If your operations are independent (which they often are) then you can operate asynchronously, which will give you a massive speedup.

Great advice running cbworkloadgen! My faith in Azure has been restored :slight_smile: Using cbworkloadgen, and setting the config values to mimic our average json data, we were able to push about 1 million records into the database in about 4 mins (that’s 4,400 ops/sec).

So now I’m working with my developer to figure out why our reads are so slow. We suspect it is because for our “update” code, we need to read a doc from CB, compare it to the incoming row of data from the import file, then write out the changes. So I suspect we’ll need to multi-thread the process of reading/comparing rows from the import data to increase our performance.

Thanks again, feels like we’re making progress.

  • Jeff

Update: Things were going well with the cbworkloadgen testing, but now we’ve hit an issue. Everything was working great, then our test runs began throwing the following at the console:

It throws this “s0 backing off…” message about 4-5 times a second at the console, and while the cluster “appears” to be processing documents:

Using some basic math, you can tell that it’s not actually doing anything. At even 1000 ops/sec, it would finish the 1 million doc workload in about 16 mins. It’s been doing this for over an hour now, and I’m assuming it would do it indefinitely if I let it run.

My servers don’t appear to be low on resources, and there are no errors in the web UI logs. So best I can assume is that the cbworkloadgen or the cluster are stuck in some sort of failure loop. I noticed a high number of “temp OOMs” per second, but I have no clue why the bucket would be low on memory. None of the other resource metrics show the server as being low on resources.

One thing that isn’t clear is how big the bucket size is. If you have a small amount of RAM allocated to that bucket, you’d have that on-and-off with TMPFAIL.

You can visualize it as filling a funnel. If the disk IO can’t keep up with 1000 ops/s but you fill the memory, what happens is cbworkloadgen stops filling the funnel until enough drains out that there is more room.

What’s the size of the bucket?

Hello -

Sorry for the delayed reply, I had some other high-pri items that took focus. The screenshot below shows the bucket values. There are 726K docs, docs are about 2K (or smaller) a piece, and you can see the RAM and disk allocations to the right.

Please let me know if any other config values would be useful.

So from that screenshot you’re using 1.9GB out of the total 1.95GB quota - i.e. you’ve used 97% of the memory.

Further details about residency ratio etc aren’t visible from that screenshot (take a look at the bucket details graphs), and I don’t know if you tuned your watermarks, but Couchbase by default sets the high watermark to 85% and low to 75%, so will start to eject documents from RAM when you hit 85% usage, stopping when usage drops to 75%.

Given you’re at 97% usage, that implies that something is preventing the server from ejecting further - which may be causing your TEMPOOM errors as the cluster can’t free up any memory.

If you provide a screenshot of the bucket stats (at least the top section with the memory stats) that might assist in understanding your environment.

Also - given this is kinda off the original topic you might consider starting a new thread.