In couchbase always primary key is the partition key?Do users have the freedom to pick partition key and primary key separately like in cassandra?
Lets say primary key in couchbase is ( A ,B,C)
Can we have partition key as (A,B) in couchbase?
Or is it always primary key is fixed as partition key in couchbase to distribute data across dataservice bucket nodes?
Also does couchbase support clustered columns to sort data within a data service bucket node?
Speaking for the primary data service, and the analytics service may have some other options… In the early days of Couchbase we discussed whether or not we should allow applications to define a hash key in addition to the key. We decided to keep it simple and while we left some room for doing that architecturally, the only implementation we have at the moment is hashing by the full key.
Technically what we do is hash to a logical partition where we have a large number of logical partitions, then we map those logical partitions to the nodes in the cluster. By doing this, we can redistribute data as needed without needing to visit all data on a given node. We also take advantage of this hashing all the way down to the client library, so when your app needs to retrieve a piece of data through the KV interface, it knows exactly which node to go to which lets us keep memcached like latencies/throughputs.
Hopefully that helps a bit.
Based on where your description of use case is going, it sounds a bit more analytical rather than operational? Maybe you can describe your data access goals a bit and we can help you with how best to structure that between data/analytics to meet your goals.
scatter gather is only partition index and after partition elimination if more than 1 partition involved.
If you have 2 indexer nodes,
partition index of 8 partitions (it will distribute partitions in two nodes based on space etc partition placement algorithm)
non-partition index it become 1 partition on one node
When query executes based on predicates, Indexer client does partition elimination (for partition index only) mostly these for equality predicates on partition keys (not range predicates). After partition elimination, it issues scan on all partitions left on the corresponding nodes.
Example: node 1 has 2 partition left, node 2 one partition left
node 1 does 2 scans (Those might done in parallel) and combine results and sent to indexer client, node 2 has one partition results directly sent to indexer client.
Then indexer client again combine results if more than 1 node involved and passed to query service.
As non partitioned index there is one node one partition no scatter gather.
While setting up an index, you get to assign a value to this attribute - Index Partitions.
A subset of the KV partitions (vbuckets) are mapped to each index partition. So for example with 1024 KV partitions, if you set an FTS index’s “Index partitions” to 4, each index partition will be obtain content from 256 KV partitions.
When you query an FTS index, it will automatically scatter and gather results from all its partitions - whether they reside locally or remote.
Lets say you have 2 FTS service nodes.FTS uses inverted index.So my question is these inverted index data is automatically partiioned across 2 FTS service nodes or one FTS node holds all FTS index data and other one is only a replica?
Also verify whether FTS nodes are standalone or FTS service needs query service to operate?