Single Platform, Multi-Purpose Couchbase: Vector Search, Geospatial, SQL++, and More

There are use cases that are best served by multiple types of data access, including SQL, vector search, geospatial queries, and key-value access. One approach is to combine/chain together multiple data systems for each access method. However, the Couchbase approach makes it possible to combine these different types of queries to solve real-world problems.

This article walks through aspects of the demo application “What is This Thing?” (aka “WITT”). For more context and background, check out:

This blog post is part of the 2024 C# Advent. However, you don’t need to understand C# to read this post: the concepts are applicable to any of the many SDKs available for Couchbase.

Vector Search: The Basics

Vector search is useful for applications that need to find similar items. For example, embeddings created by AI models can be indexed and searched. Each item in WITT is modeled like this:

{
  “name”: “Reticulated Splines”,
  “desc”: “Specialized grooves used in advanced machinery for precise alignment.”,
  “price”: 19.99,
  “image”: “data:image/png;base64,…”,
  “rating”: 5,
  “imageVector”: [ -4.5390625, 0.32543945, … ]
}

{

“name”: “Reticulated Splines”,

“desc”: “Specialized grooves used in advanced machinery for precise alignment.”,

“price”: 19.99,

“image”: “data:image/png;base64,...“,

“rating“: 5,

“imageVector”: [ -4.5390625, 0.32543945, ... ]

}

Note: The image of the item is stored as a base64-encoded string. In a production project, I’d recommend using file storage, S3, etc, rather than storing it in the database.

imageVector is retrieved by uploading the image to an AI image model, like Azure Computer Vision.

Note: One of the features of the just-announced Capella AI services is model hosting, which will reduce the latency of this step, and also increase privacy and flexibility, and potentially reduce costs.

Image Embeddings and Nearest Neighbor Search

With a vector search index on the imageVector field, Couchbase can perform nearest neighbor searches. In this case, that search would find items that are visually similar (according to the AI model). So, if a user has an image, and they want to find an item in Couchbase that is most similar to that image, a vector search index can do this:

Here’s the code in WITT that, for a given image, requests a vector embedding from Azure Computer Vision:

// Free tier: 20 Calls per minute, 5K Calls per month
// Standard tier: 10 Calls per second, starting $1.00 USD/1000 calls (Estimated)
public async Task<float[]> GetImageEmbedding(string base64Image)
{
    var endpoint = _settings.Value.Endpoint;
    var subscriptionKey = _settings.Value.SubscriptionKey;

    using (HttpClient client = new HttpClient())
    {
        // Set the subscription key and endpoint
        client.DefaultRequestHeaders.Add(“Ocp-Apim-Subscription-Key”, subscriptionKey);

        // Endpoint URL
        string url = $”{endpoint}/retrieval:vectorizeImage?overload=stream&api-version=2023-04-01-preview”;

        byte[] imageBytes = Base64PngToByteArray(base64Image);

        using (ByteArrayContent content = new ByteArrayContent(imageBytes))
        {
            content.Headers.ContentType = new MediaTypeHeaderValue(“application/octet-stream”);

            HttpResponseMessage response = await client.PostAsync(url, content);
            string jsonResponse = await response.Content.ReadAsStringAsync();

            if (response.IsSuccessStatusCode)
            {
                // Parse the JSON response to extract the vector embeddings
                JObject json = JObject.Parse(jsonResponse);
                JToken vectorEmbeddings = json[“vector”];
                return vectorEmbeddings.ToObject<float[]>();
            }

            throw new Exception(“Unable to retrieve vector embeddings for image.”);
        }
    }
}

// Free tier: 20 Calls per minute, 5K Calls per month

// Standard tier: 10 Calls per second, starting $1.00 USD/1000 calls (Estimated)

public async Task<float[]> GetImageEmbedding(string base64Image)

{

var endpoint = _settings.Value.Endpoint;

var subscriptionKey = _settings.Value.SubscriptionKey;

using (HttpClient client = new HttpClient())

{

// Set the subscription key and endpoint

client.DefaultRequestHeaders.Add(“Ocp-Apim-Subscription-Key”, subscriptionKey);

// Endpoint URL

string url = $“{endpoint}/retrieval:vectorizeImage?overload=stream&api-version=2023-04-01-preview”;

byte[] imageBytes = Base64PngToByteArray(base64Image);

using (ByteArrayContent content = new ByteArrayContent(imageBytes))

{

content.Headers.ContentType = new MediaTypeHeaderValue(“application/octet-stream”);

HttpResponseMessage response = await client.PostAsync(url, content);

string jsonResponse = await response.Content.ReadAsStringAsync();

if (response.IsSuccessStatusCode)

{

// Parse the JSON response to extract the vector embeddings

JObject json = JObject.Parse(jsonResponse);

JToken vectorEmbeddings = json[“vector”];

return vectorEmbeddings.ToObject<float[]>();

}

throw new Exception(“Unable to retrieve vector embeddings for image.”);

}

There are probably frameworks that can handle this call too, but for this simple demo, that only requires a single REST call, I found this to be sufficient. If you want to use something other than Azure with this demo, you need to implement IEmbeddingService.

Multi-Purpose Queries with SQL++

Many databases with vector search can perform a very similar operation. What Couchbase enables you to do is to perform multiple types of data operations with a single platform, a single pool of data. For instance, given a geospatial location (which can be retrieved through a web browser), you can not only query to find a similar item by image, but also combine that with a geospatial search, all through a single SQL++ query:

WITH closestStores AS (                    
    /* CTE to get closest stores based on user’s location */
    SELECT x.name, META(x).id AS id
    FROM whatisthis._default.Stores x
    WHERE SEARCH(x, {
      “fields”: [“*”],
      “query” : {
        “location” : {
          “lat” : 39.8787,
          “lon” : -83.0805
        },
        “distance” : “15mi”,
        “field” : “geo”
      } . . .
  })

  LIMIT 3
)
/* SELECT items with nearby stock */
SELECT allItems.name, allItems.`desc`, allItems.image, allItems.price, allItems.rating, SEARCH_SCORE(allItems) AS score,

    /* subquery to get stock from nearby locations */
    (SELECT . . . ) AS stock

FROM whatisthis._default.Items AS allItems

/* vector search using image embedding */
WHERE SEARCH(allItems,
  {
    “fields”: [“*”],
    “query”: { “match_none”: {} },
    “knn”: [
      {
        “k”: 4,
        “field”: “imageVector”,
        “vector”: [ -0.9135742,1.1552734, … ]
      }
    ]
  }
)
ORDER BY score DESC

WITH closestStores AS (

/* CTE to get closest stores based on user’s location */

SELECT x.name, META(x).id AS id

FROM whatisthis._default.Stores x

WHERE SEARCH(x, {

“fields”: [“*”],

“query” : {

“location” : {

“lat” : 39.8787,

“lon” : –83.0805

“distance” : “15mi”,

“field” : “geo”

} . . .

})

LIMIT 3

)

/* SELECT items with nearby stock */

SELECT allItems.name, allItems.`desc`, allItems.image, allItems.price, allItems.rating, SEARCH_SCORE(allItems) AS score,

/* subquery to get stock from nearby locations */

(SELECT . . . ) AS stock

FROM whatisthis._default.Items AS allItems

/* vector search using image embedding */

WHERE SEARCH(allItems,

{

“fields”: [“*”],

“query”: { “match_none”: {} },

“knn”: [

{

“k”: 4,

“field”: “imageVector”,

“vector”: [ –0.9135742,1.1552734, ... ]

}

]

}

)

ORDER BY score DESC

Note: this query was edited for brevity’s sake. Check out DataLayer.cs for a more complete view of the queries.

The result of this query is a “most likely match” for a given image. For example, here is the top result when uploading a picture of a pen:

The quality of matches will depend on:

The quality of the AI model
The quality/quantity of the images for a given item

In my limited testing, I’ve found the Azure Computer Vision model to be very good for matching relevant images.

The result also will contain nearby stores, where the item is available for purchase.

Beyond Vector Search and Geospatial

This query showed Couchbase’s ability to combine vector search AND geospatial search into a single operation. It also contained a CTE, JOINs, and a subquery.

Within a single query, you can also perform:

Full Text Search, including scoring, facets, boosting, etc.
Time series operations
User-defined functions (UDFs) for adding custom code (JavaScript or SQL)
Full SQL capabilities: window functions, CTEs, JOINs, aggregation, and more
Read from real-time analytics data via write-back
Query data that’s automatically synced from mobile/edge devices
Automatic caching (built-in)

Here’s the marketing section: Some databases may only be able to perform a subset of these operations, and require you to bring in other tools when you need additional functionality. This increases your costs, latency, and complexity. With Couchbase, you can keep your application simpler, faster, and cheaper. Marketing section over.

Technical Highlights

The WITT demo application referenced is built with:

React UI frontend
ASP.NET Core backend
Azure Computer Vision
Couchbase .NET SDK

You can also check out What is This Thing? as a public demo. (Keep in mind that it is all built with free-tier hosting (Azure and Capella Free Tier), and that it is still actively being developed. If you notice some slowness or downtime, that could be because of too much traffic, sorry!)

Platform

Services

Self-Managed

Capabilities

By Use Case

By Industry

Popular Docs

Quickstart

Resource Center

About

Partnerships

Single Platform, Multi-Purpose Couchbase: Vector Search, Geospatial, SQL++, and More

What We Learned Evaluating Agent Memory:The Results (Part 2)

What We Learned Evaluating Agent Memory:The Setup (Part 1)

Building a Test Matrix Pipeline for Couchbase Autonomous Operator

App Development Cost: A Complete Pricing Guide and Breakdown

Azure Key Vault for Credentials

Ready to get Started with Couchbase Capella?

Start building

Use Capella free

Get in touch

Platform

Services

Self-Managed

Capabilities

By Use Case

By Industry

Popular Docs

Quickstart

Resource Center

About

Partnerships

Single Platform, Multi-Purpose Couchbase: Vector Search, Geospatial, SQL++, and More

Vector Search: The Basics

Image Embeddings and Nearest Neighbor Search

Multi-Purpose Queries with SQL++

Beyond Vector Search and Geospatial

Technical Highlights

Get Couchbase blog updates in your inbox

Author

Postado por Matthew Groves

Deja un comentario Cancelar respuesta

Ready to get Started with Couchbase Capella?

Start building

Use Capella free

Get in touch