This post is the third part of a multi-part series exploring composite vector indexing in Couchbase. If you missed the previous posts, be sure to catch up on Parte 1 y Parte 2.
The series will cover:
- Why composite vector indexes matter, including concepts, terminology, and developer motivation. A Smart Grocery Recommendation System will be used as a running example.
- How composite vector indexes are implemented inside the Couchbase Indexing Service.
- How ORDER BY pushdown works for composite vector queries.
- Real-world performance behavior and benchmarking results.
Order By Pushdown – Composite vector indexes
Let’s imagine a feature in your food or grocery app:
“Recommend chocolate spreads with a Nutella-like taste, ordered by nutritional quality i.e. higher protein first, lower sugar next.”
This is more than simple filtering.
It requires combining:
- Semantic similarity (taste/texture)
- Nutritional filtering (sugars & proteins)
- A custom ordering strategy
The corresponding SQL++ query might look like this:
|
1 2 3 4 5 6 7 |
SELECT product_name FROM food WHERE sugars_100g < 20 AND proteins_100g > 10 ORDER BY APPROX_VECTOR_DISTANCE(text_vector, [query_embedding], 'L2'), proteins_100g DESC, sugars_100g ASC LIMIT 10; |
This single query expresses everything we want:
- Only healthier chocolate spreads
- Closest in flavor semantics to Nutella
- Higher protein preferred
- Lower sugar next
- Show the user just the top 10
Now let’s dive into how Couchbase executes this extremely efficiently.
- Scalar Filters Are Pushed Down
- The scalar predicates are evaluated inline using the Composite Vector Index.
sugars_100g < 20proteins_100g > 10
APPROX_VECTOR_DISTANCE(...)activates Couchbase’s ANN (Approximate Nearest Neighbor) scan pipeline.- The vector index locates the items whose embeddings are closest to the query embedding (Nutella in our example).
- Refer to part 2 of this blog series for internal working.
- The scalar predicates are evaluated inline using the Composite Vector Index.
- LIMIT and ORDER BY Pushdown
- This is where Couchbase becomes exceptionally efficient.
- When the query includes:
LIMIT <limit_value>
- Couchbase can push both
LÍMITEyORDENAR PORinto the index service. - This avoids sending large intermediate result sets to the query service.
Here’s how ORDER BY Pushdown with scalars and ANN works
- Indexer Builds a Concatenated Sort Key
- While performing the Composite Vector Index scan, the indexer constructs a composite sort key for each candidate item.
- The concatenated composite sort key consists of:
- ANN distance in place of the vector key
- The scalar
ORDENAR PORfields in the exactORDENAR PORsequence:proteins_100g (DESC)- Negated or encoded for descending order
sugars_100g(ASC)
- This yields a lexicographically comparable key like:
(distance, -proteins_100g, sugars_100g)
- Why replace the vector field?
- Because for ordering, the distance becomes the actual scalar value of interest and not the vector itself.
- This allows the indexer to sort candidates.
- Indexer Maintains Only the Top-K Items
- As the indexer scans ANN candidates, it keeps a K-sized priority heap (
K = LIMIT). - Each candidate is evaluated using the concatenated key.
- If the heap exceeds size K, the worst item is evicted.
- At the end, only the top
LÍMITEitems remain.
- As the indexer scans ANN candidates, it keeps a K-sized priority heap (
This means:
- No large result sets are produced
- No full sorting happens on the query node
- ANN + scalar ranking + LIMIT all happen in one place
- Indexer streams only the top 10 items to the query service
By the time results reach the query node, they are already:
- Filtered
- Semantically ordered
- Scalar-ordered
- Trimmed to LIMIT
The query node has almost nothing left to do.
This is the fastest possible execution path in Couchbase for hybrid semantic + scalar ranking.
The Flexibility of Mixing Scalars and Vectors in ORDER BY
- One of the most powerful aspects of Couchbase’s Composite Vector Index is that developers are not locked into a single ranking strategy.
- Unlike many vector databases that force you to sort “only by vector distance,” Couchbase allows you to freely mix, reorder, and permute scalar fields and vector similarity measures inside a single ORDER BY clause.
Below are four meaningful ordering permutations for our Nutella-like food search.
- Semantic-first (Flavor similarity dominates)
- Use case: You want “Nutella-like” taste to dominate ranking.
|
1 2 3 4 |
ORDER BY APPROX_VECTOR_DISTANCE(...), proteins_100g DESC, sugars_100g ASC LIMIT 10; |
- Protein-first (Healthier choices dominate)
- Use case: For fitness-focused applications where nutrition outranks flavor.
|
1 2 3 4 |
ORDER BY proteins_100g DESC, APPROX_VECTOR_DISTANCE(...), sugars_100g ASC LIMIT 10; |
-
-
- Use case: Diabetic-friendly search or sugar-reduction diets.Sugar-first (User wants lower sugar above everything else)
-
|
1 2 3 4 |
ORDER BY sugars_100g ASC, proteins_100g DESC, APPROX_VECTOR_DISTANCE(...) LIMIT 10; |
-
- Complex Hybrid Ranking
- Use case: Health-first search with semantic fallback and tiebreakers.
- Complex Hybrid Ranking
|
1 2 3 4 5 |
ORDER BY calories_100g ASC, APPROX_VECTOR_DISTANCE(...), proteins_100g DESC, sugars_100g ASC LIMIT 10; |
Final Takeaway for Developers
-
-
- Couchbase combines ANN similarity, scalar filtering, custom ORDER BY, and LIMIT pushdown directly inside the Composite Vector Index.
- This gives you the power to build real-world intelligent search features like Nutella-flavor recommendations optimized for nutrition using a single, fast, elegant SQL++ query.
- Couchbase doesn’t just store vectors in the index. It lets you query them efficiently and combine them with structured data all at scale.
-