{"id":17801,"date":"2026-01-08T12:56:16","date_gmt":"2026-01-08T20:56:16","guid":{"rendered":"https:\/\/www.couchbase.com\/blog\/?p=17801"},"modified":"2026-03-31T16:25:34","modified_gmt":"2026-03-31T23:25:34","slug":"filtered-ann-search-with-composite-vector-indexes","status":"publish","type":"post","link":"https:\/\/www.couchbase.com\/blog\/filtered-ann-search-with-composite-vector-indexes\/","title":{"rendered":"Filtered ANN Search With Composite Vector Indexes (Part 1)"},"content":{"rendered":"<p><span style=\"font-weight: 400\">This post kicks off a multi-part series on composite vector indexing in Couchbase. We will start by building intuition, then progressively dive into internals, execution optimizations, and performance.<\/span><\/p>\n<p>The series will cover:<\/p>\n<ol>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Why composite vector indexes matter, including concepts, terminology, and developer motivation. A Smart Grocery Recommendation System will be used as a running example.<\/span><\/li>\n<li style=\"font-weight: 400\"><a href=\"https:\/\/www.couchbase.com\/blog\/filtered-ann-search-with-composite-vector-indexes-2\/\"><span style=\"font-weight: 400\">How composite vector indexes are implemented inside the Couchbase Indexing Service.<\/span><\/a><\/li>\n<li style=\"font-weight: 400\"><a href=\"https:\/\/www.couchbase.com\/blog\/filtered-ann-search-with-ive-composite-vector-indexes\/\"><span style=\"font-weight: 400\">How ORDER BY pushdown works for composite vector queries.<\/span><\/a><\/li>\n<li style=\"font-weight: 400\"><a href=\"https:\/\/www.couchbase.com\/blog\/filtered-ann-search-with-composite-vector-indexes-part-4\/\"><span style=\"font-weight: 400\">Real-world performance behavior and benchmarking results.<\/span><\/a><\/li>\n<\/ol>\n<h2><span style=\"font-weight: 400\">Smart Grocery Recommendation System With Filtered ANN<\/span><\/h2>\n<p><span style=\"font-weight: 400\">Imagine you&#8217;re building a grocery-recommendation app. <\/span><\/p>\n<p><span style=\"font-weight: 400\">A user opens it on a Sunday morning and types:<\/span><\/p>\n<p><span style=\"font-weight: 400\">\u201cI love dark chocolate spread, but I\u2019m trying to cut sugar and add more protein. What else should I buy?\u201d<\/span><\/p>\n<p><span style=\"font-weight: 400\">At this moment, your system needs to understand the user\u2019s intent, compare food items semantically, and apply strict nutritional filters.<\/span><\/p>\n<p><span style=\"font-weight: 400\">This is exactly where Filtered Approximate Nearest Neighbor (Filtered ANN) comes in:<\/span><\/p>\n<ul>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Your ANN layer first finds semantically similar items\/foods that \u201cfeel like\u201d dark chocolate spread in flavor profile, texture, or category.\u00a0\u00a0<\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Then your filtering layer steps in to remove anything with high sugar, keep items above a certain protein threshold,\u00a0 and maybe enforce dietary preferences (vegan, keto, nut-free).<\/span><\/li>\n<\/ul>\n<p><span style=\"font-weight: 400\">The result? <\/span><span style=\"font-weight: 400\">A recommendation engine that understands both meaning and constraints just like a smart store associate who knows your taste and considers your goals.<\/span><\/p>\n<h3><span style=\"font-weight: 400\">Before We Get to FANN, Let\u2019s Build Intuition<\/span><\/h3>\n<p><b>NN (Nearest Neighbor):<\/b><span style=\"font-weight: 400\"> Finding the <\/span><b>most similar thing<\/b><span style=\"font-weight: 400\"> to what you have. It&#8217;s like asking, \u201cWhich food in my list tastes most like this chocolate spread?\u201d<\/span><\/p>\n<p><b>ANN (Approximate Nearest Neighbor):<\/b><span style=\"font-weight: 400\"> Finding <\/span><b>something very similar<\/b><span style=\"font-weight: 400\">, but faster. It&#8217;s like saying, \u201cI don\u2019t need the <\/span><i><span style=\"font-weight: 400\">perfect<\/span><\/i><span style=\"font-weight: 400\"> match, just something that\u2019s <\/span><i><span style=\"font-weight: 400\">close enough<\/span><\/i><span style=\"font-weight: 400\"> quickly.\u201d<\/span><span style=\"font-weight: 400\"><br \/>\n<\/span><\/p>\n<p><b>FANN (Filtered Approximate Nearest Neighbor):<\/b><span style=\"font-weight: 400\"> Finding <\/span><b>something close enough<\/b><span style=\"font-weight: 400\"> but <\/span><b>only among items that meet certain rules<\/b><span style=\"font-weight: 400\">. It&#8217;s like saying, \u201cShow me foods similar to chocolate spread, but only the ones that are low in sugar and high in protein.\u201d<\/span><\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-17802\" src=\"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/1\/2026\/01\/Screenshot-2026-01-07-at-2.41.04-PM.png\" alt=\"\" width=\"1214\" height=\"672\" srcset=\"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/1\/2026\/01\/Screenshot-2026-01-07-at-2.41.04-PM.png 1214w, https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/1\/2026\/01\/Screenshot-2026-01-07-at-2.41.04-PM-300x166.png 300w, https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/1\/2026\/01\/Screenshot-2026-01-07-at-2.41.04-PM-1024x567.png 1024w, https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/1\/2026\/01\/Screenshot-2026-01-07-at-2.41.04-PM-768x425.png 768w, https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/1\/2026\/01\/Screenshot-2026-01-07-at-2.41.04-PM-18x10.png 18w\" sizes=\"auto, (max-width: 1214px) 100vw, 1214px\" \/><\/p>\n<p><span style=\"font-weight: 400\">ANN algorithms trade a bit of <\/span><i><span style=\"font-weight: 400\">effectiveness<\/span><\/i><span style=\"font-weight: 400\"> (accuracy) for much greater <\/span><i><span style=\"font-weight: 400\">efficiency<\/span><\/i><span style=\"font-weight: 400\"> (speed and memory).<\/span><\/p>\n<p><span style=\"font-weight: 400\">A <\/span><b>composite index<\/b><span style=\"font-weight: 400\"> is an index built on <\/span><b>multiple fields (columns)<\/b><span style=\"font-weight: 400\"> together, not just one. For e<\/span><span style=\"font-weight: 400\">xample, it&#8217;s like sorting a spreadsheet first by Category, then by Sugar, then by Protein. <\/span><span style=\"font-weight: 400\">This ordering method groups all chocolate spreads together first. <\/span><span style=\"font-weight: 400\">Within that group, you can quickly find low-sugar, high-protein products without scanning everything.<\/span><\/p>\n<h3><span style=\"font-weight: 400\">Why Traditional Indexes Fail<\/span><\/h3>\n<p><span style=\"font-weight: 400\">Assume you have a small subset of the World Food Facts dataset loaded into memory as:<\/span><\/p>\n<pre class=\"lang:default decode:true\">type Food struct {\r\n    ID            string\r\n    ProductName   string\r\n    Category      string\r\n    Description   string\r\n    Sugars100g    float64\r\n    Proteins100g  float64\r\n    Tags          []string\r\n    Ingredients   []string\r\n    ...\r\n    Country       string\r\n}\r\n<\/pre>\n<p><span style=\"font-weight: 400\">To find foods like dark chocolate spreads that are low in sugar and high in protein you can use a query like\u00a0 the one below:<\/span><\/p>\n<pre class=\"lang:default decode:true\">SELECT product_name\r\nFROM food\r\nWHERE category = \"chocolate_spread\"\r\n  AND sugars_100g &lt; 20 AND proteins_100g &gt; 10;\r\n<\/pre>\n<p><span style=\"font-weight: 400\">To speed up the query, you can use a composite secondary index like the one below:\u00a0<\/span><\/p>\n<pre class=\"lang:default decode:true \">CREATE INDEX idx_food ON food(category, sugars_100g, proteins_100g, product_name)\r\n<\/pre>\n<p><span style=\"font-weight: 400\">Composite secondary indexes can be viewed as sorted lists of concatenated keys that enable faster lookups for specific values or iteration across a range of low to high values (i.e., range scan). These lookup values, as well as the high and low values, are constructed at query time using the query predicates.<\/span><\/p>\n<pre class=\"lang:default decode:true\">...\r\n(\"almond_butter\", 15, 20, \"Almond butter with chocolate chips\")\r\n(\"chocolate_spread\", 19, 7, \"Chocolate spread with nuts\")\r\n(\"chocolate_spread\", 20, 4, \"Creamy chocolate spread\")\r\n(\"chocolate_spread\", 23, 6, \"Chocolate spread with honey\")\r\n(\"chocolate_spread\", 25, 5, \"Coffee chocolate spread\")\r\n(\"milk_chocolate\", 4, 6, \"Milk chocolate spread\")\r\n(\"peanut_butter\", 19, 30, \"Chocolate flavored peanut butter\")\r\n...\r\n\r\n<\/pre>\n<p><span style=\"font-weight: 400\">Composite indexes work great for structured lookups.<\/span><\/p>\n<p><span style=\"font-weight: 400\">But a category filter can never find:<\/span><\/p>\n<ul>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">chocolate-flavored nut butters<\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">chocolate-protein spreads<\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">hazelnut cocoa blends<\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">chocolate protein bars<\/span><\/li>\n<\/ul>\n<p><span style=\"font-weight: 400\">\u2026even though a human instantly knows they are relatives of chocolate spreads.<\/span><\/p>\n<p><span style=\"font-weight: 400\">Traditional indexes only match structure, not meaning. This is why category-based range scans fail.<\/span><\/p>\n<h3><span style=\"font-weight: 400\">How Filtered ANN Works<\/span><\/h3>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-17803\" src=\"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/1\/2026\/01\/Screenshot-2026-01-07-at-2.48.11-PM.png\" alt=\"\" width=\"1272\" height=\"228\" srcset=\"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/1\/2026\/01\/Screenshot-2026-01-07-at-2.48.11-PM.png 1272w, https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/1\/2026\/01\/Screenshot-2026-01-07-at-2.48.11-PM-300x54.png 300w, https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/1\/2026\/01\/Screenshot-2026-01-07-at-2.48.11-PM-1024x184.png 1024w, https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/1\/2026\/01\/Screenshot-2026-01-07-at-2.48.11-PM-768x138.png 768w, https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/1\/2026\/01\/Screenshot-2026-01-07-at-2.48.11-PM-18x3.png 18w\" sizes=\"auto, (max-width: 1272px) 100vw, 1272px\" \/>You can convert the query and data into vectors<\/p>\n<p><span style=\"font-weight: 400\">The user\u2019s sentence is fed into an embedding model (e.g., OpenAI, Cohere, or a domain-specific model).\u00a0\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400\">The result is a dense vector that captures concepts like:<\/span><\/p>\n<ul>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">chocolate-like flavor\u00a0\u00a0<\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">spreadable texture\u00a0\u00a0<\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">dessert\/snack category\u00a0\u00a0<\/span><\/li>\n<\/ul>\n<p><span style=\"font-weight: 400\">This vector represents what the user wants as opposed to just the literal words.<\/span><\/p>\n<p><span style=\"font-weight: 400\">Next, you can find nearest neighbors (semantic similarity).<\/span><\/p>\n<p><span style=\"font-weight: 400\">Candidates might include:<\/span><\/p>\n<ul>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Hazelnut cocoa spread\u00a0\u00a0<\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Chocolate almond butter\u00a0\u00a0<\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Cocoa protein spread\u00a0\u00a0<\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Chocolate tahini\u00a0\u00a0<\/span><\/li>\n<\/ul>\n<p><span style=\"font-weight: 400\">But not all are healthy options, and the user specifically asked for low sugar and high protein.<\/span><\/p>\n<p><span style=\"font-weight: 400\">You can apply strict filters, which <\/span><span style=\"font-weight: 400\">is the \u201cFiltered\u201d part of Filtered ANN. <\/span><\/p>\n<p><span style=\"font-weight: 400\">You can filter out items:<\/span><\/p>\n<ul>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Sugar &gt; threshold (e.g., &gt;5g per serving)\u00a0\u00a0<\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Protein &lt; threshold (e.g., &lt;8g per serving)\u00a0\u00a0<\/span><\/li>\n<\/ul>\n<p><span style=\"font-weight: 400\">Your system may also combine metadata filters:<\/span><\/p>\n<ul>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Only vegan\u00a0\u00a0<\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">No palm oil\u00a0\u00a0<\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">No nuts\u00a0\u00a0<\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Under $10\u00a0\u00a0<\/span><\/li>\n<\/ul>\n<p><span style=\"font-weight: 400\">What remains is a set of items that match both meaning and constraints.<\/span><\/p>\n<h3><span style=\"font-weight: 400\">Why Solely Using Filters Does Not Work<\/span><\/h3>\n<p><span style=\"font-weight: 400\">Using only filters, you would get:<\/span><\/p>\n<ul>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Any high\u2011protein, low\u2011sugar product\u00a0\u00a0<\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">As well as items unrelated to chocolate (like tofu, Greek yogurt, chicken breast)<\/span><\/li>\n<\/ul>\n<p><span style=\"font-weight: 400\">But the user wants something &#8220;similar to chocolate spread.&#8221;<\/span><\/p>\n<p><span style=\"font-weight: 400\">Filtered ANN = Personalization + Constraints.<\/span> <span style=\"font-weight: 400\">It mimics how a human store associate would answer the request: <\/span><span style=\"font-weight: 400\">\u201cIf you want something like chocolate spread but healthier, try this\u2026\u201d<\/span><\/p>\n<p><span style=\"font-weight: 400\">Behind the scenes, however, your recommendation engine faces a subtle but serious problem. <\/span><span style=\"font-weight: 400\">Modern vector databases say they can do \u201chybrid search,\u201d but they usually keep scalar fields like sugar or protein off to the side, as plain metadata. The ANN index has no idea how to use them.<\/span><\/p>\n<p><span style=\"font-weight: 400\">So what happens?<\/span><\/p>\n<p><span style=\"font-weight: 400\">The system first pulls in a huge batch of vector-similar candidates\u2026 and only then starts checking nutrition rules like sugars_100g &lt; 20 or proteins_100g &gt; 10.<\/span><\/p>\n<p><span style=\"font-weight: 400\">It\u2019s like a store employee bringing out every chocolate-related product from the back room, placing them on the counter, and then saying:<\/span><\/p>\n<p><span style=\"font-weight: 400\">\u201cOh wait, you wanted low-sugar? High-protein? Let me throw most of these away.\u201d<\/span><\/p>\n<p><span style=\"font-weight: 400\">Some vector systems try to filter earlier during graph traversal, but they still can\u2019t do real range filtering or prefix pruning. They must fetch and decode every candidate before deciding whether to throw it out.<\/span><\/p>\n<p><span style=\"font-weight: 400\">What does this mean for your app?<\/span><\/p>\n<ul>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">More disk reads<\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">More distance calculations<\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">More latency<\/span><\/li>\n<\/ul>\n<p><span style=\"font-weight: 400\">&#8230;And a lot of wasted work for results the user will never see.<\/span><\/p>\n<p><span style=\"font-weight: 400\">This is exactly why a composite vector index that merges vector similarity and scalar pruning into the same index is a game-changer.<\/span><\/p>\n<h3><span style=\"font-weight: 400\">Composite Vector Indexes \u2013 Overview<\/span><\/h3>\n<p><span style=\"font-weight: 400\">Step 1: Embeddings Layer \u2013 Create Vector Embeddings<\/span><\/p>\n<p><span style=\"font-weight: 400\">Each product&#8217;s text description (tags, product name, category, ingredients) is converted into a high-dimensional vector using a language model. Products with similar meanings will have similar vectors.<\/span><\/p>\n<p><span style=\"font-weight: 400\">For example, embeddings for product names:<\/span><\/p>\n<ul>\n<li><span style=\"font-weight: 400\">\u00a0&#8220;dark chocolate spread&#8221; \u2192 [0.23, -0.15, 0.87, &#8230;] (384 dimensions)<\/span><\/li>\n<li><span style=\"font-weight: 400\">\u00a0&#8220;chocolate hazelnut butter&#8221; \u2192 [0.25, -0.12, 0.85, &#8230;] (similar vector)<\/span><\/li>\n<li><span style=\"font-weight: 400\">\u00a0&#8220;chocolate protein bar&#8221; \u2192 [0.18, -0.08, 0.79, &#8230;] (somewhat similar)<\/span><\/li>\n<\/ul>\n<p><span style=\"font-weight: 400\">Step 2: FANN Index: Build Composite Vector Index<\/span><\/p>\n<p><span style=\"font-weight: 400\">Create a vector index (e.g., Couchbase Vector Index, FAISS) that can quickly find nearest neighbors in the embedding space.<\/span><\/p>\n<p><span style=\"font-weight: 400\">How are vectors different from other datatypes in a composite vector index?<\/span><\/p>\n<ul>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Vectors do not have natural total order, hence sort order for vector fields cannot be determined at index time for index construction.<\/span>\n<ul>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Vector fields do not support conventional comparison predicates (such as equality or range filters) in the WHERE clause.<\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">But vector fields are used in ORDER BY with vector distance functions, and may participate in query planning via those expressions.<\/span><\/li>\n<\/ul>\n<\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Ordering is done at scan time using similarity to a query vector. The similarity function is chosen by the user as needed for the data and application.<\/span>\n<ul>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">APPROX_VECTOR_DISTANCE can be used in the ORDER BY clause and is efficiently supported when a compatible vector index exists; otherwise, it results in a full scan.<\/span><\/li>\n<\/ul>\n<\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">As each dimension in a vector does not have any standalone meaning, you can only ask questions like \u201chow similar are two vectors.\u201d So you can only find the nearest neighbor or similar elements.<\/span>\n<ul>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Similarly, function and query needs to be provided as input at query time.<\/span><\/li>\n<\/ul>\n<\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Nearest neighbor search is a computationally intensive problem which only worsens with increase in vector dimensions. So you need a time and space-efficient solution to get approximate results.<\/span>\n<ul>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Quantization methods are provided in the description in the DDL.<\/span><\/li>\n<\/ul>\n<\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">You will have to reduce the number of comparisons at query time for faster querying.<\/span>\n<ul>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Number of centroids and nprobes value help in reducing the search space.<\/span><\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<p>Composite Vector Index is an index where at least one of the keys has a vector attribute, while other attributes like dimension, similarity, and description, etc. are given to qualify the vector.<\/p>\n<pre class=\"lang:default decode:true\">CREATE INDEX idx_vec ON food(sugars_100g, proteins_100g, text_vector Vector,  product_name) WITH { \"dimension\": 384, \"similarity\" : \"L2\", \"description\": \"IVF,SQ8\" }\r\n<\/pre>\n<p><span style=\"font-weight: 400\">In this definition, the VECTOR keyword explicitly marks text_vector as a vector attribute. This is necessary because, at the JSON level, a vector embedding is stored as a simple array of floating-point numbers. Without the vector annotation, GSI would treat the field as an ordinary array and apply standard indexing semantics.<\/span><\/p>\n<p><span style=\"font-weight: 400\">By declaring a field as a vector, the user establishes an explicit contract with the GSI service that:<\/span><\/p>\n<ul>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">The index will contain a single vector key, and that key represents the embedding used for vector similarity search in this index<\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">The application is responsible for generating the vector embedding (for example, using an external embedding model) and persisting it in the specified document field.<\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">The GSI service must interpret the field semantically as a vector embedding and build vector-aware index structures optimized for Approximate Nearest Neighbor (ANN) search, rather than using conventional scalar or array indexing logic.<\/span><\/li>\n<\/ul>\n<p>In vector index DDL, a user must specify a few extra parameters like:<\/p>\n<ul>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Dimension: length of the vector embeddings created<\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Similarity: metric used for ANN search<\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Description: FAISS index like description to specify the accuracy vs speed trade-off<\/span><\/li>\n<\/ul>\n<p><span style=\"font-weight: 400\">In the above example:<\/span><\/p>\n<ul>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">We created the 384 dimensional embeddings for tags, product name, category and ingredients fields using sentence-transformers\/all-MiniLM-L6-v2 model and stored them in the text_vector field of the document.\u00a0<\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">We used IVF coarse quantizer with default number of centroids and SQ8 quantization.<\/span><\/li>\n<\/ul>\n<p><span style=\"font-weight: 400\">Step 3: Filtered ANN Query<\/span><\/p>\n<p><span style=\"font-weight: 400\">Instead of filtering by exact category, we:<\/span><\/p>\n<ul>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Generate an embedding for the query &#8220;dark chocolate spread.&#8221;<\/span>\n<ul>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">query_text = &#8220;dark chocolate spread&#8221;<\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">query_embedding = [0.23, -0.15, 0.87, 0.42, &#8230;, -0.31]\u00a0 # 384-dimensional vector<\/span><\/li>\n<\/ul>\n<\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Find the top-k most similar products using ANN search (e.g., top 10) that meet our criteria (sugars_100g &lt; 20 AND proteins_100g &gt; 10).<\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Return the top matches.<\/span><\/li>\n<\/ul>\n<p>SQL++ Example (Couchbase):<\/p>\n<pre class=\"lang:default decode:true \">SELECT product_name\r\nFROM food\r\nWHERE sugars_100g &lt; 20 AND proteins_100g &gt; 10\r\nORDER BY APPROX_VECTOR_DISTANCE(text_vector, [query_embedding], 'L2')\r\nLIMIT 10;\r\n<\/pre>\n<p><span style=\"font-weight: 400\">Key Advantages<\/span><\/p>\n<p><span style=\"font-weight: 400\">This approach finds products that are:<\/span><\/p>\n<ul>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Semantically similar to &#8220;dark chocolate spread&#8221; (using vector search).<\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Meet the nutritional filters (low sugar, high protein).<\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">May include products from different categories like &#8220;chocolate protein bars,&#8221; &#8220;nut butter spreads,&#8221; or &#8220;chocolate-flavored snacks&#8221; that are similar in meaning but don&#8217;t match the category &#8220;chocolate spreads&#8221; filter.<\/span><\/li>\n<\/ul>\n<p><span style=\"font-weight: 400\">Learn more about composite vector indexes in the next part of this series, where we will answer practical questions such as:<\/span><\/p>\n<ul>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">How are vector embeddings stored and organized efficiently inside the index layer?<\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Can a composite vector index answer scalar-only queries without reading the full document?<\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Does the order of scalar fields and vector fields in the index definition matter?<\/span><\/li>\n<\/ul>\n<p class=\"p1\">Dive deeper into the mechanics of composite vector indexing by checking out the <a href=\"https:\/\/www.couchbase.com\/blog\/filtered-ann-search-with-composite-vector-indexes-2\/\">second post<\/a> in this series, where we explore its implementation within Couchbase.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>This post kicks off a multi-part series on composite vector indexing in Couchbase. We will start by building intuition, then progressively dive into internals, execution optimizations, and performance. The series will cover: Why composite vector indexes matter, including concepts, terminology, [&hellip;]<\/p>\n","protected":false},"author":85690,"featured_media":17809,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"inline_featured_image":false,"footnotes":""},"categories":[9937],"tags":[],"ppma_author":[10168],"class_list":["post-17801","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-vector-search"],"yoast_head":"<!-- This site is optimized with the Yoast SEO Premium plugin v27.0 (Yoast SEO v27.0) - https:\/\/yoast.com\/product\/yoast-seo-premium-wordpress\/ -->\n<title>Filtered ANN Search With Composite Vector Indexes (Part 1) - The Couchbase Blog<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.couchbase.com\/blog\/filtered-ann-search-with-composite-vector-indexes\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Filtered ANN Search With Composite Vector Indexes (Part 1)\" \/>\n<meta property=\"og:description\" content=\"This post kicks off a multi-part series on composite vector indexing in Couchbase. We will start by building intuition, then progressively dive into internals, execution optimizations, and performance. The series will cover: Why composite vector indexes matter, including concepts, terminology, [&hellip;]\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.couchbase.com\/blog\/filtered-ann-search-with-composite-vector-indexes\/\" \/>\n<meta property=\"og:site_name\" content=\"The Couchbase Blog\" \/>\n<meta property=\"article:published_time\" content=\"2026-01-08T20:56:16+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2026-03-31T23:25:34+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/1\/2026\/01\/Filtered-ANN-Search-with-Composite-Vector-Indexes-1024x536.png\" \/>\n\t<meta property=\"og:image:width\" content=\"1024\" \/>\n\t<meta property=\"og:image:height\" content=\"536\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"Sai Kommaraju, Senior Software Engineer\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Sai Kommaraju, Senior Software Engineer\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"9 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/www.couchbase.com\/blog\/filtered-ann-search-with-composite-vector-indexes\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/www.couchbase.com\/blog\/filtered-ann-search-with-composite-vector-indexes\/\"},\"author\":{\"name\":\"Sai Kommaraju, Senior Software Engineer\",\"@id\":\"https:\/\/www.couchbase.com\/blog\/#\/schema\/person\/8fb575d74280ff3d0f044904277a8076\"},\"headline\":\"Filtered ANN Search With Composite Vector Indexes (Part 1)\",\"datePublished\":\"2026-01-08T20:56:16+00:00\",\"dateModified\":\"2026-03-31T23:25:34+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/www.couchbase.com\/blog\/filtered-ann-search-with-composite-vector-indexes\/\"},\"wordCount\":1806,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\/\/www.couchbase.com\/blog\/#organization\"},\"image\":{\"@id\":\"https:\/\/www.couchbase.com\/blog\/filtered-ann-search-with-composite-vector-indexes\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/1\/2026\/01\/Filtered-ANN-Search-with-Composite-Vector-Indexes.png\",\"articleSection\":[\"Vector Search\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/www.couchbase.com\/blog\/filtered-ann-search-with-composite-vector-indexes\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/www.couchbase.com\/blog\/filtered-ann-search-with-composite-vector-indexes\/\",\"url\":\"https:\/\/www.couchbase.com\/blog\/filtered-ann-search-with-composite-vector-indexes\/\",\"name\":\"Filtered ANN Search With Composite Vector Indexes (Part 1) - The Couchbase Blog\",\"isPartOf\":{\"@id\":\"https:\/\/www.couchbase.com\/blog\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/www.couchbase.com\/blog\/filtered-ann-search-with-composite-vector-indexes\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/www.couchbase.com\/blog\/filtered-ann-search-with-composite-vector-indexes\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/1\/2026\/01\/Filtered-ANN-Search-with-Composite-Vector-Indexes.png\",\"datePublished\":\"2026-01-08T20:56:16+00:00\",\"dateModified\":\"2026-03-31T23:25:34+00:00\",\"breadcrumb\":{\"@id\":\"https:\/\/www.couchbase.com\/blog\/filtered-ann-search-with-composite-vector-indexes\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/www.couchbase.com\/blog\/filtered-ann-search-with-composite-vector-indexes\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.couchbase.com\/blog\/filtered-ann-search-with-composite-vector-indexes\/#primaryimage\",\"url\":\"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/1\/2026\/01\/Filtered-ANN-Search-with-Composite-Vector-Indexes.png\",\"contentUrl\":\"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/1\/2026\/01\/Filtered-ANN-Search-with-Composite-Vector-Indexes.png\",\"width\":2400,\"height\":1256},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/www.couchbase.com\/blog\/filtered-ann-search-with-composite-vector-indexes\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/www.couchbase.com\/blog\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Filtered ANN Search With Composite Vector Indexes (Part 1)\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/www.couchbase.com\/blog\/#website\",\"url\":\"https:\/\/www.couchbase.com\/blog\/\",\"name\":\"The Couchbase Blog\",\"description\":\"Couchbase, the NoSQL Database\",\"publisher\":{\"@id\":\"https:\/\/www.couchbase.com\/blog\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/www.couchbase.com\/blog\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/www.couchbase.com\/blog\/#organization\",\"name\":\"The Couchbase Blog\",\"url\":\"https:\/\/www.couchbase.com\/blog\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.couchbase.com\/blog\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/2023\/04\/admin-logo.png\",\"contentUrl\":\"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/2023\/04\/admin-logo.png\",\"width\":218,\"height\":34,\"caption\":\"The Couchbase Blog\"},\"image\":{\"@id\":\"https:\/\/www.couchbase.com\/blog\/#\/schema\/logo\/image\/\"}},{\"@type\":\"Person\",\"@id\":\"https:\/\/www.couchbase.com\/blog\/#\/schema\/person\/8fb575d74280ff3d0f044904277a8076\",\"name\":\"Sai Kommaraju, Senior Software Engineer\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.couchbase.com\/blog\/#\/schema\/person\/image\/a2ca26c70968f44d876aa239d293a709\",\"url\":\"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/1\/2026\/01\/Sai-Kommaraju.jpeg\",\"contentUrl\":\"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/1\/2026\/01\/Sai-Kommaraju.jpeg\",\"caption\":\"Sai Kommaraju, Senior Software Engineer\"},\"url\":\"https:\/\/www.couchbase.com\/blog\/author\/saikommaraju\/\"}]}<\/script>\n<!-- \/ Yoast SEO Premium plugin. -->","yoast_head_json":{"title":"Filtered ANN Search With Composite Vector Indexes (Part 1) - The Couchbase Blog","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.couchbase.com\/blog\/filtered-ann-search-with-composite-vector-indexes\/","og_locale":"en_US","og_type":"article","og_title":"Filtered ANN Search With Composite Vector Indexes (Part 1)","og_description":"This post kicks off a multi-part series on composite vector indexing in Couchbase. We will start by building intuition, then progressively dive into internals, execution optimizations, and performance. The series will cover: Why composite vector indexes matter, including concepts, terminology, [&hellip;]","og_url":"https:\/\/www.couchbase.com\/blog\/filtered-ann-search-with-composite-vector-indexes\/","og_site_name":"The Couchbase Blog","article_published_time":"2026-01-08T20:56:16+00:00","article_modified_time":"2026-03-31T23:25:34+00:00","og_image":[{"width":1024,"height":536,"url":"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/1\/2026\/01\/Filtered-ANN-Search-with-Composite-Vector-Indexes-1024x536.png","type":"image\/png"}],"author":"Sai Kommaraju, Senior Software Engineer","twitter_card":"summary_large_image","twitter_misc":{"Written by":"Sai Kommaraju, Senior Software Engineer","Est. reading time":"9 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.couchbase.com\/blog\/filtered-ann-search-with-composite-vector-indexes\/#article","isPartOf":{"@id":"https:\/\/www.couchbase.com\/blog\/filtered-ann-search-with-composite-vector-indexes\/"},"author":{"name":"Sai Kommaraju, Senior Software Engineer","@id":"https:\/\/www.couchbase.com\/blog\/#\/schema\/person\/8fb575d74280ff3d0f044904277a8076"},"headline":"Filtered ANN Search With Composite Vector Indexes (Part 1)","datePublished":"2026-01-08T20:56:16+00:00","dateModified":"2026-03-31T23:25:34+00:00","mainEntityOfPage":{"@id":"https:\/\/www.couchbase.com\/blog\/filtered-ann-search-with-composite-vector-indexes\/"},"wordCount":1806,"commentCount":0,"publisher":{"@id":"https:\/\/www.couchbase.com\/blog\/#organization"},"image":{"@id":"https:\/\/www.couchbase.com\/blog\/filtered-ann-search-with-composite-vector-indexes\/#primaryimage"},"thumbnailUrl":"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/1\/2026\/01\/Filtered-ANN-Search-with-Composite-Vector-Indexes.png","articleSection":["Vector Search"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/www.couchbase.com\/blog\/filtered-ann-search-with-composite-vector-indexes\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/www.couchbase.com\/blog\/filtered-ann-search-with-composite-vector-indexes\/","url":"https:\/\/www.couchbase.com\/blog\/filtered-ann-search-with-composite-vector-indexes\/","name":"Filtered ANN Search With Composite Vector Indexes (Part 1) - The Couchbase Blog","isPartOf":{"@id":"https:\/\/www.couchbase.com\/blog\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.couchbase.com\/blog\/filtered-ann-search-with-composite-vector-indexes\/#primaryimage"},"image":{"@id":"https:\/\/www.couchbase.com\/blog\/filtered-ann-search-with-composite-vector-indexes\/#primaryimage"},"thumbnailUrl":"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/1\/2026\/01\/Filtered-ANN-Search-with-Composite-Vector-Indexes.png","datePublished":"2026-01-08T20:56:16+00:00","dateModified":"2026-03-31T23:25:34+00:00","breadcrumb":{"@id":"https:\/\/www.couchbase.com\/blog\/filtered-ann-search-with-composite-vector-indexes\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.couchbase.com\/blog\/filtered-ann-search-with-composite-vector-indexes\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.couchbase.com\/blog\/filtered-ann-search-with-composite-vector-indexes\/#primaryimage","url":"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/1\/2026\/01\/Filtered-ANN-Search-with-Composite-Vector-Indexes.png","contentUrl":"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/1\/2026\/01\/Filtered-ANN-Search-with-Composite-Vector-Indexes.png","width":2400,"height":1256},{"@type":"BreadcrumbList","@id":"https:\/\/www.couchbase.com\/blog\/filtered-ann-search-with-composite-vector-indexes\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.couchbase.com\/blog\/"},{"@type":"ListItem","position":2,"name":"Filtered ANN Search With Composite Vector Indexes (Part 1)"}]},{"@type":"WebSite","@id":"https:\/\/www.couchbase.com\/blog\/#website","url":"https:\/\/www.couchbase.com\/blog\/","name":"The Couchbase Blog","description":"Couchbase, the NoSQL Database","publisher":{"@id":"https:\/\/www.couchbase.com\/blog\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.couchbase.com\/blog\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/www.couchbase.com\/blog\/#organization","name":"The Couchbase Blog","url":"https:\/\/www.couchbase.com\/blog\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.couchbase.com\/blog\/#\/schema\/logo\/image\/","url":"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/2023\/04\/admin-logo.png","contentUrl":"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/2023\/04\/admin-logo.png","width":218,"height":34,"caption":"The Couchbase Blog"},"image":{"@id":"https:\/\/www.couchbase.com\/blog\/#\/schema\/logo\/image\/"}},{"@type":"Person","@id":"https:\/\/www.couchbase.com\/blog\/#\/schema\/person\/8fb575d74280ff3d0f044904277a8076","name":"Sai Kommaraju, Senior Software Engineer","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.couchbase.com\/blog\/#\/schema\/person\/image\/a2ca26c70968f44d876aa239d293a709","url":"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/1\/2026\/01\/Sai-Kommaraju.jpeg","contentUrl":"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/1\/2026\/01\/Sai-Kommaraju.jpeg","caption":"Sai Kommaraju, Senior Software Engineer"},"url":"https:\/\/www.couchbase.com\/blog\/author\/saikommaraju\/"}]}},"acf":[],"authors":[{"term_id":10168,"user_id":85690,"is_guest":0,"slug":"saikommaraju","display_name":"Sai Kommaraju, Senior Software Engineer","avatar_url":{"url":"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/1\/2026\/01\/Sai-Kommaraju.jpeg","url2x":"https:\/\/www.couchbase.com\/blog\/wp-content\/uploads\/sites\/1\/2026\/01\/Sai-Kommaraju.jpeg"},"0":null,"1":"","2":"","3":"","4":"","5":"","6":"","7":"","8":""}],"_links":{"self":[{"href":"https:\/\/www.couchbase.com\/blog\/wp-json\/wp\/v2\/posts\/17801","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.couchbase.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.couchbase.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.couchbase.com\/blog\/wp-json\/wp\/v2\/users\/85690"}],"replies":[{"embeddable":true,"href":"https:\/\/www.couchbase.com\/blog\/wp-json\/wp\/v2\/comments?post=17801"}],"version-history":[{"count":0,"href":"https:\/\/www.couchbase.com\/blog\/wp-json\/wp\/v2\/posts\/17801\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.couchbase.com\/blog\/wp-json\/wp\/v2\/media\/17809"}],"wp:attachment":[{"href":"https:\/\/www.couchbase.com\/blog\/wp-json\/wp\/v2\/media?parent=17801"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.couchbase.com\/blog\/wp-json\/wp\/v2\/categories?post=17801"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.couchbase.com\/blog\/wp-json\/wp\/v2\/tags?post=17801"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.couchbase.com\/blog\/wp-json\/wp\/v2\/ppma_author?post=17801"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}