CouchBase Internals

couchbwiss · June 27, 2015, 11:57am

Hello,

I would like to understand how Couchbase split document keys in a cluster environment. I believe Couchbase using Hash function to distribute document keys between the cluster nodes. what should be for performance tuning perspective the best practices in choosing document keys values so there is no overhead in network traffic between the cluster nodes and take advantage of the parallelism in read / write operations.

Thanks
Wissa

ldoguin · June 29, 2015, 7:13am

Hi,

The reason Couchbase uses hash partitioning is precisely to optimize the sharding and allow great performances. Whatever keys you should won’t really matter on that side because it will be hashed by Couchbase.

couchbwiss · June 29, 2015, 7:16am

Thanks @ldoguin ; is there a way to choose the keys so there is a sequential reads ? Anyone knows the hash function used?

ldoguin · July 1, 2015, 3:03pm

Here’s a previous answer about hash function that you might find useful: https://groups.google.com/forum/#!topic/couchbase/_RNYi2_kyNA

Basically we don’t recommend changing it, unless you really really really know what you are doing.

That being said, if you want to do something like a sequential read, you could probably buffer small doc on the client side and then write them as one doc every minute for instance. This can work well for time-series. Can you describe your use case a bit more?

couchbwiss · July 1, 2015, 4:04pm

thanks @ldoguin ; my case is I am building a forum application ; when user can submit post in JSON documents and inserted in couchDB bucket; I am building now the system architecture, data modeling parts and I would like to consider performance also for reads and writes; That said I am exploring all the good possibilities and best practices to reach very good performance and the best of couchbase. So I am thinking maybe sequetial reads will be better (or not?) then random reads for sequential documents. Especially I thinking to use N1QL and views to extract post during the day and do some analysis on them …
Thanks

ldoguin · July 1, 2015, 5:07pm

If you afraid of the scatter/gather happening with views, you should definitely look into GSI and the new MDS architecture. You’ll be able to have an index on just one node and this way avoid the scatter/gather. I invite you to look at the latest presentation on the topic:

GSI: https://www.youtube.com/watch?v=WvjYKO27Vdk
MDS: https://www.youtube.com/watch?v=b09peBHtITA

Topic		Replies	Views
Distribution and subsequent redistribution of keys? Couchbase Server	2	1908	July 31, 2013
Coucbase partition key vs Primary key SQL++ index	28	1936	June 17, 2021
Manual sharding In Couchbase Server Couchbase Server	3	598	June 8, 2023
Uniform distribution of data Couchbase Server	2	1123	April 20, 2021
Modify the hash algorithm so some keys will always end up on the same node Java SDK java	6	1314	February 16, 2022

CouchBase Internals

Related topics