Cache Miss Ratio in case Lower value of Resident Ratio

Hi Team,

We had populated 24 million of records (–key-prefix a) in one of the CB bucket whose storage engine type (couchstore) and resident ratio was 79 percent. Then again populated another 22 million records (–key-prefix b) so that all data with “–key-prefix a” will push down to disk and resident ratio was 34 percent. While perfroming get operation for (–key-prefix a) it should be disk read and in CB UI cache miss ratio should reach near to 100 percent but we can see that cache miss ratio is very minimal (0.251 percent).Due to which we are not able to see proper result for r/s and rkB/s metrices through iostat OS command.

Could you please advise why we are seeing such low cache miss ratio.

We are using below pillowfight commands to populate data.

/opt/couchbase/bin/cbc-pillowfight -U couchbase://xx.xx.xx.xx/data1 -u -P --min-size 1000 --max-size 1000 --json --set-pct 100 --batch-size 1 --num-items 1000000000 --sequential --num-threads 20 --rate-limit 8000 --key-prefix a;
/opt/couchbase/bin/cbc-pillowfight -U couchbase://xx.xx.xx.xx/data1 -u -P --min-size 1000 --max-size 1000 --json --set-pct 100 --batch-size 1 --num-items 1000000000 --sequential --num-threads 20 --rate-limit 8000 --key-prefix b;
/opt/couchbase/bin/cbc-pillowfight -U couchbase://xx.xx.xx.xx/data1 -u -P --min-size 1000 --max-size 1000 --json --set-pct 0 --batch-size 1 --num-items 24000000 --num-threads 20 --rate-limit 8000 --key-prefix a --no-population;

Thanks,
Debasis

We had continued run these commands for almost 45 mins to populate the data.

So you killed the first command after it inserted 24M documents, and killed the second command after it inserted 22M documents. OK.

What appears to be happening is all 20 threads are fetching the same documents. Use --num-threads 1 and you should see a much higher cache-miss rate. With 0.009% resident ratio, I was seeing 10% cache-miss with 20 threads, 100% cache-miss with 1 thread.

Also - omitting --rate-limit might save you some time.

Edit: there is a very recent option ‘–rand-space-per-thread’ that will give each thread different random sequences when --sequential is not specified. If your cbc-pillowfight recognizes that option, you can use that.

Thanks @mreiche . I need help on below two questions.

  1. According to you if my command uses all 20 threads to fetch same docs, is there any option we can use in pillowfight so that my 20 threads fetch different docs.

Blockquote What appears to be happening is all 20 threads are fetching the same documents

  1. While doing random access in below pillowfight command, can we give start-at option like in sequential scan provides.

/opt/couchbase/bin/cbc-pillowfight -U couchbase://xx.xx.xx.xx/data1 -u -P --min-size 1000 --max-size 1000 --json --set-pct 0 --batch-size 1 --num-items 24000000 --num-threads 20 --rate-limit 8000 --key-prefix a --no-population;

Thanks,
Debasis

See my previous response for the option.

  1. While doing random access in below pillowfight command, can we give start-at option like in sequential scan provides.

Not according to --help.

   --start-at                    For sequential access, set the first item [Default=0]

So you mean to say that we use --num-thread 1 to fetch docs and if I want my pillowfight to fetch 20 different docs, then I need to run 20 instances of pillowfight with --num-thread 1. Please confirm.

Thanks,
Debasis

Hi @mreiche ,

While going through docs for pillowfight the “–num-threads” definition showing that each thread is assigned its own object. Is there any way we can see in CB UI or cli commands to verify how couchbase is behaving w.r.t "--num-threads" while perfroming pillowfight test on cb cluster. Please suggest on this.

https://docs.couchbase.com/sdk-api/couchbase-c-client-2.4.8/md_doc_cbc-pillowfight.html

  • -t, --num-threads=NTHREADS: Set the number of threads (and thus the number of client instances) to run concurrently. Each thread is assigned its own client object.*

Why not get the latest version which has the --rand option?

Or write your own load driver client.

Here’s the source for pillow fight. Knock yourself out.

If you need more assistance open a case with customer support.

We are using the pilowfight version (Version=3.2.5) and we did not find any option such as –rand . The CB Version we are using is 7.1.

Could you please let us know which version of pillowfight the mentioned option is available .

Thanks,
Debasis

Hello @mreiche , may I ask a question?
When we are generating a load using pillowfight, we see very high number of get_misses. My theory is as follows:
pillowfight generates documents with keys having 20 characters. With prefix “a”, first document will be a + 19 zeroes and the last document will be a + 19 nines. Hope that is correct. Lets call this the “FULL RANGE”.
Lets say 20m documents have been generated by pillowfight using --sequential. Then the key range existing in database will be first 20m docs from the FULL RANGE. Lets call this the “20m RANGE”.
Post that, if pillowfight is doing only reads (GETs) and --sequential is not used, then it will do a GET for any key from the FULL RANGE and not the 20m RANGE. That is the reason why we see a huge number of get_misses because most keys which are being fetched are not existing in the database.

If above theory is correct, do you know if there is a way to specify a limited range (like the 20m RANGE) to fetch from?

Many Thanks

Apparently not released yet. So either use 1 thread (in mutiple cbc-pillowfight executions if desired). Or wright your own load driver. GitHub - mikereiche/loaddriver

" Build couchbase-server-7.2.0-5304 contains libcouchbase commit 8af01cb with commit message:

CCBC-1546: Add arg to allow threads to work from different rand numbers"

Yes @mreiche due to high number of “get_misses” only few records are getting read from disk. Thus the cache miss ratio in UI shows very low percentage. The commands which we executed mentioned in initial thread. Thanks for your valuable time.

Below is get_misses captured while executing one of the test load.

“get_misses”:[7497.4,7457.6,7445.8,7445.8,7445.8,7445.8,7445.8,7445.8,7445.8,7455,7455,7501.400000000001,7457,7457,7457,7457,7457,7457,7457,7545,7545,7532.5,7555.3,7555.3,7555.3,7555.3,7555.3,7555.3,7555.3,7530.400000000001,7530.400000000001,7482.5,7481,7481,7481,7481,7481,7481,7481,7461.4,7461.4,7468.7,7449.9,7449.9,7449.9,7449.9,7449.9,7449.9,7449.9,7450.9,7450.9,7471.1,7496,7496,7496,7496,7496,7496,7496,7443.1]

Thanks,
Debasis

“Yes @mreiche due to high number of “get_misses” only few records are getting read from disk. Thus the cache miss ratio in UI shows very low percentage.”

get_misses and cache_misses are mutually exclusive. A get_miss is a get on a document that does not exist. A cache_miss is a get on a document that exists, but is not in RAM.

  1. cache-hits as described by the OP will result in documents not being read from disk. Use --num–threads 1. If multiple concurrent reads are required, try executing multiple copies of cbc-pillowfight (the separate cbc-pillowfight executions might use different random sequences - I don’t know, you’ll be able to tell from the cache-miss ratio). And there is --rand-space-per-thread coming in a future release.

  2. get_misses - by asking pillowfight to read documents that you haven’t written (why would you do that??) will also result in documents not being read from disk.

there is no magical “FULL RANGE” that pillowfight uses. If you ask pillowfight to write one million (num_items) (and num_cycles at least num_items) sequential documents, it will write one million sequential documents (except if you kill it before it is done). If you ask pillow fight to randomly read one million (num-items) documents, it will not attempt to read documents outside the one-million. If you generate one-million (num-items) documents with pillowfight, and then ask pillow fight to randomly read from ten-million (num-items) documents, nine out of ten of those reads is going to be a get_miss - that’s on you.

here’s a handy curl command to get get_misses and cache miss rate. There are 60 entries for each as these are the numbers for the last 60 seconds.

curl -k -s -u Administrator:password http://localhost:8091/pools/default/buckets/my_bucket/stats | jq .op.samples.ep_cache_miss_rate,.op.samples.get_misses

We are using the pilowfight version (Version=3.2.5) and we did not find any option such as –rand . The CB Version we are using is 7.1.

See my previous reply:

Apparently not released yet. So either use 1 thread (in mutiple cbc-pillowfight executions if desired). Or wright your own load driver. GitHub - mikereiche/loaddriver

" Build couchbase-server-7.2.0-5304 contains libcouchbase commit 8af01cb with commit message:

CCBC-1546: Add arg to allow threads to work from different rand numbers"

Could you please let us know which version of pillowfight the mentioned option is available ..

.

See the post immediately above yours.

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.