Búsqueda vectorial

Filtered ANN Search with i’ve Composite Vector Indexes

This post is the third part of a multi-part series exploring composite vector indexing in Couchbase. If you missed the previous posts, be sure to catch up on Parte 1 y Parte 2.

The series will cover:

  1. Why composite vector indexes matter, including concepts, terminology, and developer motivation. A Smart Grocery Recommendation System will be used as a running example.
  2. How composite vector indexes are implemented inside the Couchbase Indexing Service.
  3. How ORDER BY pushdown works for composite vector queries.
  4. Real-world performance behavior and benchmarking results.

 

Order By Pushdown – Composite vector indexes

Let’s imagine a feature in your food or grocery app:

“Recommend chocolate spreads with a Nutella-like taste, ordered by nutritional quality i.e. higher protein first, lower sugar next.”

This is more than simple filtering.

It requires combining:

  • Semantic similarity (taste/texture)
  • Nutritional filtering (sugars & proteins)
  • A custom ordering strategy

The corresponding SQL++ query might look like this:

This single query expresses everything we want:

  • Only healthier chocolate spreads
  • Closest in flavor semantics to Nutella
  • Higher protein preferred
  • Lower sugar next
  • Show the user just the top 10

Now let’s dive into how Couchbase executes this extremely efficiently.

  1. Scalar Filters Are Pushed Down
    1. The scalar predicates are evaluated inline using the Composite Vector Index.
      1. sugars_100g < 20
      2. proteins_100g > 10
    2. APPROX_VECTOR_DISTANCE(...) activates Couchbase’s ANN (Approximate Nearest Neighbor) scan pipeline.
    3. The vector index locates the items whose embeddings are closest to the query embedding (Nutella in our example).
    4. Refer to part 2 of this blog series for internal working.
  2. LIMIT and ORDER BY Pushdown
    1. This is where Couchbase becomes exceptionally efficient.
    2. When the query includes:
      1. LIMIT <limit_value>
    3. Couchbase can push both LÍMITE y ORDENAR POR into the index service.
    4. This avoids sending large intermediate result sets to the query service.

Here’s how ORDER BY Pushdown with scalars and ANN works

  1. Indexer Builds a Concatenated Sort Key
    1. While performing the Composite Vector Index scan, the indexer constructs a composite sort key for each candidate item.
    2. The concatenated composite sort key consists of:
      1. ANN distance in place of the vector key
      2. The scalar ORDENAR POR fields in the exact ORDENAR POR sequence:
        1. proteins_100g (DESC)
          1. Negated or encoded for descending order
        2. sugars_100g (ASC)
    3. This yields a lexicographically comparable key like:
      1. (distance, -proteins_100g, sugars_100g)
    4. Why replace the vector field?
      1. Because for ordering, the distance becomes the actual scalar value of interest and not the vector itself.
      2. This allows the indexer to sort candidates.
  2. Indexer Maintains Only the Top-K Items
    1. As the indexer scans ANN candidates, it keeps a K-sized priority heap (K = LIMIT).
    2. Each candidate is evaluated using the concatenated key.
    3. If the heap exceeds size K, the worst item is evicted.
    4. At the end, only the top LÍMITE items remain.

 

This means:

  • No large result sets are produced
  • No full sorting happens on the query node
  • ANN + scalar ranking + LIMIT all happen in one place
  • Indexer streams only the top 10 items to the query service

By the time results reach the query node, they are already:

  • Filtered
  • Semantically ordered
  • Scalar-ordered
  • Trimmed to LIMIT

The query node has almost nothing left to do.
This is the fastest possible execution path in Couchbase for hybrid semantic + scalar ranking.

The Flexibility of Mixing Scalars and Vectors in ORDER BY

  • One of the most powerful aspects of Couchbase’s Composite Vector Index is that developers are not locked into a single ranking strategy. 
  • Unlike many vector databases that force you to sort “only by vector distance,” Couchbase allows you to freely mix, reorder, and permute scalar fields and vector similarity measures inside a single ORDER BY clause.

Below are four meaningful ordering permutations for our Nutella-like food search.

  1. Semantic-first (Flavor similarity dominates)

    1. Use case: You want “Nutella-like” taste to dominate ranking.

  1. Protein-first (Healthier choices dominate)

    1. Use case: For fitness-focused applications where nutrition outranks flavor.

      1. Use case: Diabetic-friendly search or sugar-reduction diets.Sugar-first (User wants lower sugar above everything else)

    1. Complex Hybrid Ranking
      1. Use case: Health-first search with semantic fallback and tiebreakers.

 

Final Takeaway for Developers

      • Couchbase combines ANN similarity, scalar filtering, custom ORDER BY, and LIMIT pushdown directly inside the Composite Vector Index.
      • This gives you the power to build real-world intelligent search features like Nutella-flavor recommendations optimized for nutrition using a single, fast, elegant SQL++ query.
      • Couchbase doesn’t just store vectors in the index. It lets you query them efficiently and combine them with structured data all at scale.
Comparte este artículo
Recibe actualizaciones del blog de Couchbase en tu bandeja de entrada
Este campo es obligatorio.

Author

Posted by Sai Kommaraju

Deja un comentario

¿Listo para empezar con Couchbase Capella?

Empezar a construir

Consulte nuestro portal para desarrolladores para explorar NoSQL, buscar recursos y empezar con tutoriales.

Utilizar Capella gratis

Ponte manos a la obra con Couchbase en unos pocos clics. Capella DBaaS es la forma más fácil y rápida de empezar.

Póngase en contacto

¿Quieres saber más sobre las ofertas de Couchbase? Permítanos ayudarle.