Cbc-Pillowfight test for GETS

Hi All,

We are generating a test work load through cbc-Pillowfight where we want all 8k GETS should come from DISK. Any suggestions how to achieve that workload.


Hey @Debasis_Mallick, to be honest it’s going to be a little bit tricky…and whatever you do will be quite “contrived” and not very realistic. I can certainly understand what you’re trying to achieve. Since the very early beginnings of Couchbase, it has been a focus to transparently manage the integrated caching layer so that the application doesn’t have to care. This means there’s not been a need for direct control of which specific records are kept in RAM or not, and the system is designed to operate in a constantly fluid state that balances not only the documents themselves, but their replicas, all of the metadata, queues and connection resources, etc.

It might be better to come at this from the other direction…rather than having 8k GETs and trying to get them all to come from disk, it might be better to generate a higher workload and tune the database so that you see a certain number of GETs/sec and then observe the latency variation. Essentially you’ll want to align enough data with a small enough RAM quota so that as some data is read in from the disk, other data needs to be ejected from RAM…and then that data is read back in again.

Another option would be to first load in ‘x’ amount of data with a known key-prefix so that you’ve reached or exceeded the high water mark of the bucket. Then load in the same amount of data with a different key prefix which should cause all of the first set to be ejected. Then perform only a single pass of GETs on the first set of data which will cause it to all be read back into RAM. That should simulate what you’re asking for, but again, I really wouldn’t use it to form any definitive conclusions.

It would be better to have a more realistic workload (using something like YCSB) and measure the ongoing performance of a running cluster rather. Otherwise you’ll be missing things like compaction, index updates, failover/rebalance, etc. It’s obviously much more effort to do so, but is much more realistic to what your production application will experience.

Thanks @perry for your response.
As you mentioned we are using first option i:e: loading data with “–key-prefix a” and make the resident ratio less than 100 percent (70 percent) then loading data “–key-prefix b” and made the bucket item count double so that all data with “–key-prefix a” will push to disk. Then we are doing the GET operation for “–key-prefix a” and we can see the GETS around 8k/s but the cache miss ratio is very minimal like 4 or 5 percentage in CB UI, which is surprising for us. Any inputs why we are getting such low cache miss ratio or anything missed during test.

Below pillowfight commands used for my test.

Populate data with prefix-a
date; /opt/couchbase/bin/cbc-pillowfight -U couchbase:// -u cbadmin -P cbadmin --min-size 1000 --max-size 1000 --json --set-pct 100 --batch-size 1 --num-items 100000000 --sequential --num-threads 1 --rate-limit 8000 --key-prefix a; date

Populate data with prefix-b
date; /opt/couchbase/bin/cbc-pillowfight -U couchbase:// -u cbadmin -P cbadmin --min-size 1000 --max-size 1000 --json --set-pct 100 --batch-size 1 --num-items 100000000 --sequential --num-threads 1 --rate-limit 8000 --key-prefix b; date

Get ops for the data prefix-a
date;/opt/couchbase/bin/cbc-pillowfight -U couchbase:// -u cbadmin -P cbadmin --min-size 1000 --max-size 1000 --json --set-pct 0 --batch-size 1000 --num-items 24000000 --num-threads 20 --rate-limit 8000 --key-prefix a --no-population;date

Hey @Debasis_Mallick - off the top of my head, I suspect that pillow-fight is re-reading at least some of the same objects over and over and so they are coming directly from RAM, with only one in every “x” requests being served from disk.

Have a look at the full set of options here: https://github.com/couchbase/libcouchbase/blob/master/doc/cbc-pillowfight.markdown

There’s a couple different options in there that you should be able to combine to get what you want out of it. I think you’ll want to use ‘-c 1’ to specify only running the workload once. You might also want to experiment with ‘-T’ to have pillowfight output its own latency timings.

Hope that helps?

Thanks @perry for response. Let me check and update in case further assistance required.