|
1 2 3 4 5 6 7 8 |
SELECIONAR u.nome, CONTAGEM(o.id) AS total_orders DE `comércio`.vendas.usuários AS u JUNTAR `comércio`.vendas.pedidos AS o ON u.id = o.user_id ONDE o.status = "concluído" E DATE_DIFF_STR(NOW_STR(), o.data_do_pedido, "dia") <= 30 GRUPO BY u.nome ORDEM BY total_orders DESC LIMITE 5; |
The query above provides valuable insights from your data that’s stored in Couchbase about your top five users who generated the most completed orders within the past 30 days. But what if you’re not an advanced SQL++ developer and need the answers by 11 p.m. for a report? You then need to wait for a developer to write a SQL++ query and get you the answers.
Alternatively, consider a case where you need to do some ad hoc debugging to address questions like:
- Are there any documents where the date the order was delivered is missing?
- Does that mean that the order was cancelled? Or did we misplace the order and the order never got delivered? Or was everything ok, but we simply missed adding the order_delivered value in the field?
In this case, you not only need to search the order_delivered field, but also look at order_cancelled or investigate comments to figure out if it was misplaced, etc. So the query to be written isn’t simple or straightforward.
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
SELECIONAR o.orderId, o.data do pedido, o.order_cancelled, o.order_delivered, o.comentários, CASO QUANDO o.order_cancelled = VERDADEIRO ENTÃO "Order was cancelled" QUANDO QUALQUER c IN o.comentários SATISFAÇÕES INFERIOR(c) GOSTO "%misplac%" OU INFERIOR(c) GOSTO "%lost%" ENTÃO "Order may have been misplaced" QUANDO QUALQUER c IN o.comentários SATISFAÇÕES INFERIOR(c) GOSTO "%deliver%" ENTÃO "Delivered but field not updated" ELSE "Reason unknown — investigate" FIM AS razão DE `comércio`.`vendas`.`pedidos` AS o ONDE o.order_delivered IS FALTANDO OU o.order_delivered IS NULL; |
In such cases, it would help if you had a reliable assistant available 24×7 to get all these answers. The UDF described in this blog is such an assistant. It accepts your questions in the most natural way and returns results in JSON. Behind the scenes, it connects to a model of your choice, along with your API key, to convert your thoughts into SQL++ and then executes it. And all you need to invoke this assistant is to use the UDF.
|
1 2 3 4 5 6 7 |
SELECIONAR NL2SQL( ["`comércio`.`vendas`.`pedidos`"], "Are there any documents where the order_delivered date is missing?and if so why?", "", "https://api.openai.com/v1/chat/completions", "gpt-4o-2024-05-13" ) ; |
Como funciona
1. Set up the library.
You first create a JavaScript library used by the UDF.
Library:
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 |
/* input: keyspaces: an array of strings, each string represents a keyspaces "bucket.scope.collection" with proper escaping using grave-accent quote wherever required prompt: users natural language request apikey: your openai api key model: string representing the model's name, see https://platform.openai.com/docs/api-reference/completions/create#completions-create-model for more details output: chat-completions api response with the generated sql statement */ função inferencer(k) { var infq = N1QL("SELECT t.properties FROM(INFER "+k+ ") as t") ; var res=[] para(const doc de infq) { res.empurrar(doc) } retorno res[0]; } função nl2sql(Espaços-chave, imediato, apikey, modelapi, modelo) { collectionSchema = {} para(const k em Espaços-chave) { c = inferencer(Espaços-chave[k]) collectionSchema[Espaços-chave[k]] = c } collectionSchemaStr = JSON.stringify(collectionSchema) promptContent = `Informações:\nCollection's schema: ${collectionSchemaStr}\n\nPrompt: \"${prompt}\"\n\nThe query context is set.\n\nBased on the above Information, write valid SQL++ and return only the statement and no explanation. For retrieval, use aliases. Use UNNEST from the FROM clause when appropriate. \n\nIf you're com certeza o Prompt pode't be used to generate a query, first say \"#ERR:\" and then explain why not.` data = {"messages":[{"role":"system","content":"You are a Couchbase Capella expert. Your task is to create valid queries to retrieve or create data based on the provided Information.\n\nApproach this task step-by-step and take your time."},{"role":"user","content":promptContent}], "model": model, "temperature":0, "max_tokens":1024, "stream":false} var dataStr = JSON.stringify(data) .replace(/\\/g, "\\\\") // escape backslashes .replace(/"/g, '\\"'); // escape quotes var completionsurl = modelapi var q= `SELECT CURL("${completionsurl}", { "solicitação": "POST", "cabeçalho": ["Autorização: Portador ${apikey}", "Conteúdo-tipo: aplicativo/json"], "dados": "${dataStr}" }) AS result;` var completionsq = N1QL(q); var res = [] for(const doc of completionsq) { res.push(doc); } tente { content = res[0].result.choices[0].message.content } catch(e) { return res; } stmt = content.trim().substring(7, content.length-4) isSelect = (stmt.substring(0,6).toLowerCase())==="selecionar" if(isSelect === false){ retornar { "gerado_declaração": stmt } } var runq = N1QL(stmt); var rrun = [] for(const doc of runq) { rrun.push(doc) } retornar { "gerado_declaração": stmt, "resultados": rrun } } |
2. Upload the library.
Run the curl command after copying the provided library code into a file, i.e., usingailib.js.
|
1 2 |
enrolar -X POST http://localhost:9499/evaluator/v1/libraries/usingailib --dados-binário @usingailib.js -u Administrador:senha |
3. Create the UDF.
Use the create function statement below to create the UDF once you have created the library:
|
1 2 |
CRIAR OU SUBSTITUIR FUNÇÃO NL2SQL(Espaços-chave, imediato, apikey, modelapi, modelo) LÍNGUA JAVASCRIPT AS "nl2sql" AT "usingailib"; |
NL2SQL() now acts as your multilingual translator between human language and Couchbase’s query engine. You simply give it some context and a natural language request, and it returns a response.
How the UDF Thinks
Under the hood, it uses your preferred model when invoking the UDF to understand your intent and generate a query that Couchbase can execute.
The advantage of using the chat completions API means you could simply plug in a model from other providers that are compliant with the same API spec. You can use your own private LLM or known ones from Open AI, Gemini, Claude, etc.
The invoked UDF requires the following information from you:
- Espaços-chave – An array of strings, each representing a Couchbase keyspace (bucket.scope.collection).Use grave accent quotes where needed to escape special names (like
amostra de viagem.inventory.route). This tells the UDF where to look for your data. - imediato – Your request in plain English (or any other language).
Example: “Show me all users who made a purchase in the last 24 hours.” - apikey – Your API key used for authenticating with the model endpoint.
- model endpoint – e.g., Open AI compliant chat completions URL.
- modelo – The name of the model you want to use from the provider.
e.g., “gpt-4o-2024-05-13”
There are also several available functions in the library:
inferencer()
Before generating a query, the UDF first tries to understand your data. The inferencer() helper function calls Couchbase’s INFER statement to retrieve a collection’s schema:
|
1 2 3 4 5 6 7 8 |
função inferencer(k) { var infq = N1QL("SELECT t.properties FROM (INFER " + k + ") AS t"); var res = []; para (const doc de infq) { res.empurrar(doc); } retorno res[0]; } |
This schema is used to help the AI understand what kind of data lives inside each collection.
The main function: nl2sql()
- Collects all schemas for the given keyspaces using the inferencer(). Constructs a prompt that includes: the inferred schema, your natural language query, and a Couchbase prompt to nudge the LLM.
- Sends it to the LLM.
- Extracts the generated SQL++ from the model’s response.
- Executes it directly if it’s a SELECT statement and returns both the generated SQL++ statement and the query results.
The reason for not executing non-select statements is that you don’t want this UDF to insert, update, or delete documents in a collection without you verifying it. So the SQL++ statement lets you execute it after it’s been verified.
Example use case:
|
1 2 3 4 5 6 |
SELECIONAR padrão:NL2SQL( ["`travel-sample`.inventory.hotel"], "Give me hotels in San Francisco that have free parking and free breakfast and a rating of more than 3", "", "https://api.openai.com/v1/chat/completions", "gpt-4o-2024-05-13" ); |
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 |
Resultado: [{ "$1": { "generated_statement": "SELECT h.name, h.address, h.city, h.state, h.country, h.free_parking, h.free_breakfast, r.ratings.Overall\nFROM `travel-sample`.inventory.hotel AS h\nUNNEST h.reviews AS r\nWHERE h.city = \"San Francisco\"\n AND h.free_parking = true\n AND h.free_breakfast = true\n AND r.ratings.Overall > 3;", "resultados": [{ "Geral": 4, "endereço": "520 Church St", "cidade": "São Francisco", "país": "Estados Unidos", "free_breakfast" (café da manhã gratuito): verdadeiro, "free_parking" (estacionamento gratuito): verdadeiro, "name" (nome): "Parker House", "estado": "Califórnia" }, { "Geral": 4, "endereço": "520 Church St", "cidade": "São Francisco", "país": "Estados Unidos", "free_breakfast" (café da manhã gratuito): verdadeiro, "free_parking" (estacionamento gratuito): verdadeiro, "name" (nome): "Parker House", "estado": "Califórnia" }, { "Geral": 5, "endereço": "520 Church St", "cidade": "São Francisco", "país": "Estados Unidos", "free_breakfast" (café da manhã gratuito): verdadeiro, "free_parking" (estacionamento gratuito): verdadeiro, "name" (nome): "Parker House", "estado": "Califórnia" }, { "Geral": 4, "endereço": "520 Church St", "cidade": "São Francisco", "país": "Estados Unidos", "free_breakfast" (café da manhã gratuito): verdadeiro, "free_parking" (estacionamento gratuito): verdadeiro, "name" (nome): "Parker House", "estado": "Califórnia" }, { "Geral": 5, "endereço": "465 Grant Ave", "cidade": "São Francisco", "país": "Estados Unidos", "free_breakfast" (café da manhã gratuito): verdadeiro, "free_parking" (estacionamento gratuito): verdadeiro, "name" (nome): "Grant Plaza Hotel", "estado": "Califórnia" }, { "Geral": 5, "endereço": "465 Grant Ave", "cidade": "São Francisco", "país": "Estados Unidos", "free_breakfast" (café da manhã gratuito): verdadeiro, "free_parking" (estacionamento gratuito): verdadeiro, "name" (nome): "Grant Plaza Hotel", "estado": "Califórnia" }, ... |
Experimenting with models from other providers
The next example uses Gemini’s Open AI-compatible API. You simply change the model provider’s URL from the previous Open AI API to Gemini’s API. Also, be sure to change the model parameter to a model it recognizes. Of course, you need to also update the api-key from Open AI’s key to Gemini’s key.
|
1 2 3 4 5 6 7 |
SELECIONAR NL2SQL( ["`travel-sample`.inventory.hotel"], "Show me hotels in France", "", "https://generativelanguage.googleapis.com/v1beta/openai/chat/completions", "gemini-2.0-flash" )como p; |
The following illustrates the result:
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
[{ "p": { "generated_statement": "SELECT h.name AS hotel_name, h.city AS hotel_city\nFROM `travel-sample`.inventory.hotel AS h\nWHERE h.country = \"France\";", "resultados": [{ "hotel_city": "Giverny", "hotel_name": "Os Robins" }, { "hotel_city": "Giverny", "hotel_name": "Le Clos Fleuri" }, ... { "hotel_city": "Ferney-Voltaire", "hotel_name": "Hotel Formule 1" } ] } }] |
Conclusão
This blog provides a glimpse into how you can leverage AI to interact with your data in Couchbase. With this UDF, natural language querying becomes a reality – no SQL++ expertise required. It is model-agnostic and safe for production queries.
And this is just the beginning. In the future, we hope to extend it to:
- Image → SQL++
- Voice → SQL++
- Agent-like pipelines
… all running inside Couchbase workflows.
Referências
Capella IQ: https://docs.couchbase.com/cloud/get-started/capella-iq/get-started-with-iq.html
Chat completions APIs:
https://platform.openai.com/docs/api-reference/chat
https://ai.google.dev/gemini-api/docs/openai#rest