Impact of Storing Multiple Embedding Model Vectors in a Single Document on Vector Search Performance

PShri · October 25, 2024, 2:36pm

We are currently storing vectors from different embedding models in the same Couchbase document, and we’re building FTS vector indexes on each of the fields. Here’s an example of how we structure the data in our documents:

{
...
  "embedding_map": {
    "text-embedding-3-large": { "vector": [] },
    "voyage-3": { "vector": [] }
  }
...
}

Each vector belongs to a different model, and we perform searches using FTS on these fields. While this approach has been working, we’ve noticed that response times are sometimes extremely slow, and we’re not sure where the bottleneck is. It could be related to FTS vector search performance or potentially another part of our application.

My question is: Could having vectors from different embedding models in the same document affect FTS vector search performance? Could this be due to how Couchbase handles multiple vector fields within the same document, or would it be better to separate the vectors into different documents?

Any insights on how this setup might impact performance or any best practices to follow when designing vector indexes for multiple models would be highly appreciated.

Thanks in advance!

Let me know if you need any additional details.

@abhinav (tagging you because I know you are doing search stuff)

PShri · October 30, 2024, 3:23pm

@abhinav just following up in case this post got overlooked.

abhinav · October 30, 2024, 3:35pm

Thanks @PShri - it seems I did miss the notification from the original post.

Could having vectors from different embedding models in the same document affect FTS vector search performance?

It shouldn’t, but would you let me know what version of couchbase server are you using, I’d strongly recommend 7.6.2 or later for vector search.

Holding multiple vectors in different fields within the same document will mean separate vector indexes for each field within each index segment. One vector index should not affect the other for as long as you have sufficient resources available to handle your use case.

Would you maybe share the logs (cbcollect_info) from one or more of the nodes hosting the search service in your cluster? Perhaps we can find some clues there on what’s happening underneath.

PShri · October 30, 2024, 4:18pm

We are using 7.6.3.

The issue no longer exists. It is also not reproducible. It got too slow a couple of times and I suspected it could be FTS vector search (as that was the new addition). Next time I come across that, I shall share the logs.

Sounds like there is no difference from a performance standpoint whether the vectors are in a single document or separated out.

Thanks for the information, though I am not sure what exactly is an index “Segment”.

abhinav · October 30, 2024, 4:28pm

Sounds good.

I am not sure what exactly is an index “Segment”

The search index follows a partitioned segmented architecture - LSM-like. As each index partition ingests data, batched content is persisted into immutable reference-counted files which we call segments - you can view these as mini indexes. A merger/compactor routine is responsible for eventually merging smaller segments into larger ones in a tiered fashion. Those segments whose reference count falls to zero (stale data) are purged.

system · January 28, 2025, 4:29pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Couchbase Capella vector search does not return orginal vector field Couchbase Capella couchbase-capella , capella , vector-search	7	96	April 28, 2025
Vector Search with a Python Couchbase SDK raise couchbase.exceptions.QueryIndexNotFoundException Python SDK python	23	670	August 6, 2024
FTS search and vectorsearch Couchbase Server query	14	605	October 31, 2024
Vector Search on Scoped Index Returns 0 Results Despite Correct Setup (Python SDK) Full Text Search vector-search	5	95	July 4, 2025
Couchbase filter with python sdk Couchbase Server query	2	188	September 12, 2024

Impact of Storing Multiple Embedding Model Vectors in a Single Document on Vector Search Performance

Related topics