Filtered ANN Search with i've Composite Vector Indexes

This post is the third part of a multi-part series exploring composite vector indexing in Couchbase. If you missed the previous posts, be sure to catch up on Parte 1 y Parte 2.

The series will cover:

Why composite vector indexes matter, including concepts, terminology, and developer motivation. A Smart Grocery Recommendation System will be used as a running example.
How composite vector indexes are implemented inside the Couchbase Indexing Service.
How ORDER BY pushdown works for composite vector queries.
Real-world performance behavior and benchmarking results.

Order By Pushdown – Composite vector indexes

Let’s imagine a feature in your food or grocery app:

“Recommend chocolate spreads with a Nutella-like taste, ordered by nutritional quality i.e. higher protein first, lower sugar next.”

This is more than simple filtering.

It requires combining:

Semantic similarity (taste/texture)
Nutritional filtering (sugars & proteins)
A custom ordering strategy

The corresponding SQL++ query might look like this:

SELECT product_name
FROM food
WHERE sugars_100g < 20 AND proteins_100g > 10
ORDER BY APPROX_VECTOR_DISTANCE(text_vector, [query_embedding], 'L2'),
         proteins_100g DESC,
         sugars_100g ASC
LIMIT 10;

SELECT product_name

FROM food

WHERE sugars_100g < 20 AND proteins_100g > 10

ORDER BY APPROX_VECTOR_DISTANCE(text_vector, [query_embedding], 'L2'),

proteins_100g DESC,

sugars_100g ASC

LIMIT 10;

This single query expresses everything we want:

Only healthier chocolate spreads
Closest in flavor semantics to Nutella
Higher protein preferred
Lower sugar next
Show the user just the top 10

Now let’s dive into how Couchbase executes this extremely efficiently.

Scalar Filters Are Pushed Down
1. The scalar predicates are evaluated inline using the Composite Vector Index.
  1. sugars_100g < 20
  2. proteins_100g > 10
2. APPROX_VECTOR_DISTANCE(...) activates Couchbase’s ANN (Approximate Nearest Neighbor) scan pipeline.
3. The vector index locates the items whose embeddings are closest to the query embedding (Nutella in our example).
4. Refer to part 2 of this blog series for internal working.
LIMIT and ORDER BY Pushdown
1. This is where Couchbase becomes exceptionally efficient.
2. When the query includes:
  1. LIMIT <limit_value>
3. Couchbase can push both LÍMITE y ORDENAR POR into the index service.
4. This avoids sending large intermediate result sets to the query service.

Here’s how ORDER BY Pushdown with scalars and ANN works

Indexer Builds a Concatenated Sort Key
1. While performing the Composite Vector Index scan, the indexer constructs a composite sort key for each candidate item.
2. The concatenated composite sort key consists of:
  1. ANN distance in place of the vector key
  2. The scalar ORDENAR POR fields in the exact ORDENAR POR sequence:
    1. proteins_100g (DESC)
      1. Negated or encoded for descending order
    2. sugars_100g (ASC)
3. This yields a lexicographically comparable key like:
  1. (distance, -proteins_100g, sugars_100g)
4. Why replace the vector field?
  1. Because for ordering, the distance becomes the actual scalar value of interest and not the vector itself.
  2. This allows the indexer to sort candidates.
Indexer Maintains Only the Top-K Items
1. As the indexer scans ANN candidates, it keeps a K-sized priority heap (K = LIMIT).
2. Each candidate is evaluated using the concatenated key.
3. If the heap exceeds size K, the worst item is evicted.
4. At the end, only the top LÍMITE items remain.

This means:

No large result sets are produced
No full sorting happens on the query node
ANN + scalar ranking + LIMIT all happen in one place
Indexer streams only the top 10 items to the query service

By the time results reach the query node, they are already:

Filtered
Semantically ordered
Scalar-ordered
Trimmed to LIMIT

The query node has almost nothing left to do.
This is the fastest possible execution path in Couchbase for hybrid semantic + scalar ranking.

The Flexibility of Mixing Scalars and Vectors in ORDER BY

One of the most powerful aspects of Couchbase’s Composite Vector Index is that developers are not locked into a single ranking strategy.
Unlike many vector databases that force you to sort “only by vector distance,” Couchbase allows you to freely mix, reorder, and permute scalar fields and vector similarity measures inside a single ORDER BY clause.

Below are four meaningful ordering permutations for our Nutella-like food search.

Semantic-first (Flavor similarity dominates)
1. Use case: You want “Nutella-like” taste to dominate ranking.

ORDER BY APPROX_VECTOR_DISTANCE(...),
         proteins_100g DESC,
         sugars_100g ASC
LIMIT 10;

ORDER BY APPROX_VECTOR_DISTANCE(...),

proteins_100g DESC,

sugars_100g ASC

LIMIT 10;

Protein-first (Healthier choices dominate)
1. Use case: For fitness-focused applications where nutrition outranks flavor.

ORDER BY proteins_100g DESC,
         APPROX_VECTOR_DISTANCE(...),
         sugars_100g ASC
LIMIT 10;

ORDER BY proteins_100g DESC,

APPROX_VECTOR_DISTANCE(...),

sugars_100g ASC

LIMIT 10;

1. 1. Use case: Diabetic-friendly search or sugar-reduction diets.Sugar-first (User wants lower sugar above everything else)

ORDER BY sugars_100g ASC,
         proteins_100g DESC,
         APPROX_VECTOR_DISTANCE(...)
LIMIT 10;

ORDER BY sugars_100g ASC,

proteins_100g DESC,

APPROX_VECTOR_DISTANCE(...)

LIMIT 10;

1. Complex Hybrid Ranking
  1. Use case: Health-first search with semantic fallback and tiebreakers.

ORDER BY calories_100g ASC,
         APPROX_VECTOR_DISTANCE(...),
         proteins_100g DESC,
         sugars_100g ASC
LIMIT 10;

ORDER BY calories_100g ASC,

APPROX_VECTOR_DISTANCE(...),

proteins_100g DESC,

sugars_100g ASC

LIMIT 10;

Final Takeaway for Developers

1. - Couchbase combines ANN similarity, scalar filtering, custom ORDER BY, and LIMIT pushdown directly inside the Composite Vector Index.
  - This gives you the power to build real-world intelligent search features like Nutella-flavor recommendations optimized for nutrition using a single, fast, elegant SQL++ query.
  - Couchbase doesn’t just store vectors in the index. It lets you query them efficiently and combine them with structured data all at scale.

Sai Kommaraju, Senior Software Engineer

Comparte este artículo

Platform

Self-Managed

Services

Capabilities

Why Couchbase?

Migrate to Capella

By Use Case

By Industry

By Application Need

Popular Docs

By Developer Role

Quickstart

Resource Center

About

Partnerships

Our Services

Partners: Register a Deal

Ready to register a deal with Couchbase?

Marriott

Filtered ANN Search with i’ve Composite Vector Indexes

Order By Pushdown – Composite vector indexes

Recibe actualizaciones del blog de Couchbase en tu bandeja de entrada

Author

Posted by Sai Kommaraju

Deja un comentario Cancelar respuesta

¿Listo para empezar con Couchbase Capella?

Empezar a construir

Utilizar Capella gratis

Póngase en contacto