I’m trying to implement semantic search over multiple text fragments within a single document, and I’m unclear whether Couchbase’s vector search supports searching across a list of embedding vectors stored in one field.
I tested by embedding a single string, and the results are a lot better. Is the semantic search averaging results over all embeddings in the list? Depending on the string order, the results vary.
Does Couchbase vector search support indexing and searching a field containing multiple vectors (list of lists)? If yes, how should I configure the Search index to handle this structure?
If no, what’s the recommended pattern?
Since v8.0, Couchbase supports three types of vector indexes
Hyperscale Vector Index (powered by Query & GSI service)
Composite Vector Index (powered by Query & GSI service)
Search Vector Index (powered by Search service) – was also available since v7.6
You can read more about different kinds of indexes here
Now, coming to your question on the supported data models:
None of the 3 vector indexes currently support list of lists as shown in your example.
Workaround:
Suggested data model is one vector per field. This is supported by all 3 vector indexes above.
In your case, it will look like:
{
"doc_id": 1
“curious_fact_embeddings”: [0.123, 0.456, 0.789, …]
}
{
"doc_id": 2
“curious_fact_embeddings”: [0.111, 0.222, 0.333, …]
}
We are also exploring other data models like list of objects where embedding is one of the field inside each object. This might come in a future release.
Great, that’s the approach I took, denormalizing the data. It would be nice to have a vector search for a list of embeddings, reducing the need to duplicate metadata when applying hybrid filters. Thanks for the response!