Couchbase Server

Talk to Your Data: A UDF That Speaks Your Language

The query above provides valuable insights from your data that’s stored in Couchbase about your top five users who generated the most completed orders within the past 30 days. But what if you’re not an advanced SQL++ developer and need the answers by 11 p.m. for a report? You then need to wait for a developer to write a SQL++ query and get you the answers.

Alternatively, consider a case where you need to do some ad hoc debugging to address questions like:

  • Are there any documents where the date the order was delivered is missing?
  • Does that mean that the order was cancelled? Or did we misplace the order and the order never got delivered? Or was everything ok, but we simply missed adding the order_delivered value in the field?

In this case, you not only need to search the  order_delivered field, but also look at order_cancelled or investigate comments to figure out if it was misplaced, etc. So the query to be written isn’t simple or straightforward. 

In such cases, it would help if you had a reliable assistant available 24×7 to get all these answers. The UDF described in this blog is such an assistant. It accepts your questions in the most natural way and returns results in JSON. Behind the scenes, it connects to a model of your choice, along with your API key, to convert your thoughts into SQL++ and then executes it. And all you need to invoke this assistant is to use the UDF.

How It Works

1. Set up the library.
You first create a JavaScript library used by the UDF.

Library:

2. Upload the library.
Run the curl command after copying the provided library code into a file, i.e., usingailib.js.

3. Create the UDF.
Use the create function statement below to create the UDF once you have created the library:

NL2SQL() now acts as your multilingual translator between human language and Couchbase’s query engine. You simply give it some context and a natural language request, and it returns a response.

How the UDF Thinks

Under the hood, it uses your preferred model when invoking the UDF to understand your intent and generate a query that Couchbase can execute.

The advantage of using the chat completions API means you could simply plug in a model from other providers that are compliant with the same API spec. You can use your own private LLM or known ones from Open AI, Gemini, Claude, etc.

The invoked UDF requires the following information from you:

  1. keyspaces – An array of strings, each representing a Couchbase keyspace (bucket.scope.collection).Use grave accent quotes where needed to escape special names (like travel-sample.inventory.route). This tells the UDF where to look for your data.
  2. prompt – Your request in plain English (or any other language).
    Example: “Show me all users who made a purchase in the last 24 hours.”
  3. apikey – Your API key used for authenticating with the model endpoint.
  4. model endpoint – e.g., Open AI compliant chat completions URL.
  5. model – The name of the model you want to use from the provider.
    e.g., “gpt-4o-2024-05-13”

There are also several available functions in the library:

inferencer()

Before generating a query, the UDF first tries to understand your data. The inferencer() helper function calls Couchbase’s INFER statement to retrieve a collection’s schema:

This schema is used to help the AI understand what kind of data lives inside each collection.

The main function: nl2sql()

  • Collects all schemas for the given keyspaces using the inferencer(). Constructs a prompt that includes: the inferred schema, your natural language query, and a Couchbase prompt to nudge the LLM.
  • Sends it to the LLM.
  • Extracts the generated SQL++ from the model’s response.
  • Executes it directly if it’s a SELECT statement and returns both the generated SQL++ statement and the query results.

The reason for not executing non-select statements is that you don’t want this UDF to insert, update, or delete documents in a collection without you verifying it. So the SQL++ statement lets you execute it after it’s been verified.

Example use case:

Experimenting with models from other providers

The next example uses Gemini’s Open AI-compatible API. You simply change the model provider’s URL from the previous Open AI API to Gemini’s API. Also, be sure to change the model parameter to a model it recognizes. Of course, you need to also update the api-key from Open AI’s key to Gemini’s key.

The following illustrates the result:

Conclusion

This blog provides a glimpse into how you can leverage AI to interact with your data in Couchbase. With this UDF, natural language querying becomes a reality – no SQL++ expertise required. It is model-agnostic and safe for production queries.

And this is just the beginning. In the future, we hope to extend it to:

  • Image → SQL++
  • Voice → SQL++
  • Agent-like pipelines

… all running inside Couchbase workflows.

References
Capella IQ: https://docs.couchbase.com/cloud/get-started/capella-iq/get-started-with-iq.html
Chat completions APIs:
https://platform.openai.com/docs/api-reference/chat
https://ai.google.dev/gemini-api/docs/openai#rest

Share this article
Get Couchbase blog updates in your inbox
This field is required.

Author

Posted by Gaurav Jayaraj - Software Engineer

Gaurav Jayaraj is an intern in the Query team at Couchbase R&D. Gaurav is pursuing his Bachelors in Computer Science from PES University, Bangalore.

Leave a comment

Ready to get Started with Couchbase Capella?

Start building

Check out our developer portal to explore NoSQL, browse resources, and get started with tutorials.

Use Capella free

Get hands-on with Couchbase in just a few clicks. Capella DBaaS is the easiest and fastest way to get started.

Get in touch

Want to learn more about Couchbase offerings? Let us help.