Coucbase partition key vs Primary key

Hi All,
In couchbase always primary key is the partition key?Do users have the freedom to pick partition key and primary key separately like in cassandra?

Lets say primary key in couchbase is ( A ,B,C)
Can we have partition key as (A,B) in couchbase?

Or is it always primary key is fixed as partition key in couchbase to distribute data across dataservice bucket nodes?
Also does couchbase support clustered columns to sort data within a data service bucket node?

Thanks,
Isuru

If you are looking this global secondary index prespective.

As long as A,B immutable you can create secondary GSI index PARTITION BY HASH (A,B).

Thanks vsr1.
It is clear that couchbase allow only indexes to be partitioned by hash where as in cassandra data itself can be partitioned by hash key.

Does couchbase support composite hash so that you can partition the index where hash key is composite(A,B) type? i.e PARTITION BY HASH(A,B,etc)?

Please clarify

Thanks
Isuru

Speaking for the primary data service, and the analytics service may have some other options… In the early days of Couchbase we discussed whether or not we should allow applications to define a hash key in addition to the key. We decided to keep it simple and while we left some room for doing that architecturally, the only implementation we have at the moment is hashing by the full key.

Technically what we do is hash to a logical partition where we have a large number of logical partitions, then we map those logical partitions to the nodes in the cluster. By doing this, we can redistribute data as needed without needing to visit all data on a given node. We also take advantage of this hashing all the way down to the client library, so when your app needs to retrieve a piece of data through the KV interface, it knows exactly which node to go to which lets us keep memcached like latencies/throughputs.

Hopefully that helps a bit.

Based on where your description of use case is going, it sounds a bit more analytical rather than operational? Maybe you can describe your data access goals a bit and we can help you with how best to structure that between data/analytics to meet your goals.

HI ingenthr,
My question is on indexes of couchbase.
Does couchbase support composite hash so that you can partition the index where hash key is composite(A,B) type? i.e PARTITION BY HASH(A,B,etc)?

Please clarify.

THanks

My apologies, I thought you were asking about primary key specifically, not separately defined indexes. I’ve pinged a couple of other people who may be able to help in the thread.

Yes. Secondary Partition index supports composite PARTITION BY HASH(A,B,etc)? Only requirement is it is immutable.

Check example at the end of Queriability section of Example Couchbase GSI Index partitioning | The Couchbase Blog

Thanks @vsr1 for the clarifications.

@vsr1

Lets say your index is as below without partition.
CREATE INDEX ih ON customer(state, name, zip, status)
WHERE type = “cx”

Does not couchbase index service partition them using a hashkey automatically combining all (state,name,zip,status) fields across all the index service nodes?

If not are those indexes stored in a single node or partioned by primary key regardless of the index key?

Please calrify,
Thanks,
Isuru

Hi @iisuru ,

Without PARTITION BY HASH(…), Whole index resides on single indexer node with single partition. When you have replica, replication copy resides on different node as whole.

If you need to partition by document key

CREATE INDEX ih ON customer(state, name, zip, status) PARTITION BY HASH(META().id)
WHERE type = "cx"

@vsr1
Then when you have 2 index nodes 2nd one simply act as a replica node for availability and there is no default partitioning happens when you dont use PARTITION BY.

Is it the same for FTS indexes? Or can we do partioning on FTS indexes with FTS service nodes?

In GSI index there is no default partitioning if the index doesn’t have PARTITION BY clause.

FTS index works differently, @abhinav able to answer that

While @abhinav answers that @vsr1 can you clarify scatter and gather in couchbase? Does that mean GSIs will eliminate scattter and gather since indexes are in single node?

So non indexed queries are executed by query service as scatter and gather method to deliver the final result?
Please clarify

Thanks

This how it works. cc @deepkaran.salooja

scatter gather is only partition index and after partition elimination if more than 1 partition involved.

If you have 2 indexer nodes,
partition index of 8 partitions (it will distribute partitions in two nodes based on space etc partition placement algorithm)
non-partition index it become 1 partition on one node

When query executes based on predicates, Indexer client does partition elimination (for partition index only) mostly these for equality predicates on partition keys (not range predicates). After partition elimination, it issues scan on all partitions left on the corresponding nodes.
Example: node 1 has 2 partition left, node 2 one partition left
node 1 does 2 scans (Those might done in parallel) and combine results and sent to indexer client, node 2 has one partition results directly sent to indexer client.
Then indexer client again combine results if more than 1 node involved and passed to query service.

As non partitioned index there is one node one partition no scatter gather.

See Solution it explained

Right now, you do not have the option to create FTS indexes from N1QL directly - you can only query them.

There’s documentation on how to set up FTS indexes here …
https://docs.couchbase.com/server/6.6/fts/fts-creating-indexes.html

While setting up an index, you get to assign a value to this attribute - Index Partitions.
A subset of the KV partitions (vbuckets) are mapped to each index partition. So for example with 1024 KV partitions, if you set an FTS index’s “Index partitions” to 4, each index partition will be obtain content from 256 KV partitions.

When you query an FTS index, it will automatically scatter and gather results from all its partitions - whether they reside locally or remote.

@vsr1
So when no partition indexes are used with 2 index nodes 2nd index node automatically get index data replicated or manually we need to flag to replicate?

@abhinav
Lets say you have 2 FTS service nodes.FTS uses inverted index.So my question is these inverted index data is automatically partiioned across 2 FTS service nodes or one FTS node holds all FTS index data and other one is only a replica?

Also verify whether FTS nodes are standalone or FTS service needs query service to operate?
Please clarify

If using EE . Availability and Performance | Couchbase Docs explains . When used replication with following options it is automatic

  1. WITH {“nodes”:[“node1:8091”, “node2:8091”, “node3:8091”]}
  2. WITH {“num_replica”: 2}

If using CE, no partition indexes, no index replication. But you can get High Availability by creating duplicate indexes described Point 9 Duplicate Index Create the Right Index, Get the Right Performance.

Thanks @vsr1 its clear

@iisuru If you have 2 couchbase nodes hosting the FTS service - yes an index will be automatically partitioned across the 2 nodes if and only if you choose more than 1 index partition for the index.

For example, if you choose 2 Index partitions with 1 replica here’s how the distribution of active and replica partitions would look across your cluster …

+------------------+------------------+
|      NODE 1      |      NODE 2      |
+------------------+------------------+
|     Active 1     |     Active 2     |
|     Replica 2    |     Replica 1    |
+------------------+------------------+