Couchbase Capella vector search does not return orginal vector field

Hi,

I tried to do vector search following the color examples from docs with Java SDK, I am not able to get the original vector field in return rows. I wonder if that is expected. In this case, I am not able to get colorvect_l2 or embedding_vector_dot in return rows.
e.g. the return row I got

Found row: SearchRow{index='color-vector-sample.color.rgb-vector_2314ec5b4a0f44a6_4c1c5584', id='000080', score=3.4028234663852886E38, explanation=, locations=Optional.empty, fragments={}, fields={"brightness":14.592,"description":"...","embedding_model":"text-embedding-ada-002-v2","id":"#000080","verbs":["deep","rich","sophisticated"]}}

During search index setup, non vector field has an option called ‘store’ (now called ‘Include in search results’), but vector fields do not have those fields.
I tried to change those vector field to have “store” by manually upload the json file with ‘store’ property, but I am still not able to get them in return rows.

One additional question, I found some docs that seems couchbase server support restapi on json based query on vector search, does capella not support that? I only find a way to do it through web UI console.

The API is there (it’s what the SDK uses). But you are on your own to determine what the query node and port are. It’s preferable to use the SDK.

@xiang I’m not sure if the server supports returning the vector embedding itself. Pinging @abhinav in case he can help with that.

Just out of curiosity, what is your use-case for it?

@mreiche Thanks for the quick response.

@graham.pople Thanks for the response. My use case is that I want to get those original vectors and do lookup join later.

Why not store the original vectors separately?

That is actually what I tried. With the same example, I add a new field for those vectors but set it as numbers or text instead of vector type and I did able to get the result. But I still want to ask if that is expected as I did not find relevant documentation. Also I wonder if it makes difference on storage size.

So store is not supported over vector-type fields.

The way I see it, you have 2 options here …

Option1: use SQL by situating the vector search request within the SEARCH function and have N1QL obtain the original vector in the SELECT predicate - which would obtain it from KV.

Option2: Given that these vector fields contain an array of numbers - you should be able to index the field alongside a vector field as a numeric field using a field alias and then store the numeric field. So your index mapping for the field would look something like this …

   "embedding": {
       "enabled": true,
       "dynamic": false,
       "fields": [
        {
         "dims": 1536,
         "index": true,
         "name": "embedding",
         "similarity": "dot_product",
         "type": "vector",
         "vector_index_optimized_for": "recall"
        },
        {
         "index": true,
         "name": "embedding_as_numbers",
         "store": true,
         "type": "number"
        }
       ]
      }
     }

Now when you run your vector search request against field embedding you can request for field embedding_as_numbers - and you’ll obtain the flattened out version of the original vector as part of every hit of your search response.

1 Like

My use case is that I want to get those original vectors and do lookup join later.

@xiang is there potentially an option 3 where you lookup join on something else, perhaps the document id (since I think the vector will always be 1-to-1 with that).

1 Like

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.