How to Index and Query Multiple Vectors per Document (Search Service)?

I’m trying to implement semantic search over multiple text fragments within a single document, and I’m unclear whether Couchbase’s vector search supports searching across a list of embedding vectors stored in one field.

“curious_fact_embeddings”: [ [0.123, 0.456, 0.789, …], [0.234, 0.567, 0.890, …], [0.345, 0.678, 0.901, …], [0.456, 0.789, 0.012, …]]

I want to search across all individual vectors within curious_fact_embeddings to find the document with the most semantically similar fact.

My search code:

vector_query = VectorQuery.create(
    "curious_fact_embeddings",
    query_vector,  # 768-dimensional query vector
    num_candidates=20
)
vector_search = VectorSearch.from_vector_query(vector_query)
results = collection.search(vector_search)

I tested by embedding a single string, and the results are a lot better. Is the semantic search averaging results over all embeddings in the list? Depending on the string order, the results vary.

Does Couchbase vector search support indexing and searching a field containing multiple vectors (list of lists)? If yes, how should I configure the Search index to handle this structure?
If no, what’s the recommended pattern?

Any guidance would be greatly appreciated!

Hi Ramiro,

Since v8.0, Couchbase supports three types of vector indexes

  1. Hyperscale Vector Index (powered by Query & GSI service)
  2. Composite Vector Index (powered by Query & GSI service)
  3. Search Vector Index (powered by Search service) – was also available since v7.6

You can read more about different kinds of indexes here

Now, coming to your question on the supported data models:
None of the 3 vector indexes currently support list of lists as shown in your example.

Workaround:
Suggested data model is one vector per field. This is supported by all 3 vector indexes above.


In your case, it will look like: 

{
"doc_id": 1
“curious_fact_embeddings”: [0.123, 0.456, 0.789, …]
}

{
"doc_id": 2
“curious_fact_embeddings”: [0.111, 0.222, 0.333, …]
}

We are also exploring other data models like list of objects where embedding is one of the field inside each object. This might come in a future release.

Hope that answers your question!

1 Like

Great, that’s the approach I took, denormalizing the data. It would be nice to have a vector search for a list of embeddings, reducing the need to duplicate metadata when applying hybrid filters. Thanks for the response!

Thanks for the feedback and glad you found the workaround.!