Generative AI (GenAI)

CodeLab: Building a RAG Application With Couchbase Capella Model Services and LangChain

In this tutorial, you will learn how to build a retrieval-augmented generation (RAG) application using Couchbase AI Services to store data, generate embedding using embedding models, and LLM inference. We will create a RAG system that:

  1. Ingests news articles from the BBC News dataset.
  2. Generates vector embeddings using the NVIDIA NeMo Retriever model via Capella Model Services.
  3. Stores and indexes these vectors in Couchbase Capella.
  4. Performs semantic search to retrieve relevant context.
  5. Generates answers using the Mistral-7B LLM hosted on Capella.

You can find the notebook source code for this CodeLab here.

Why Couchbase AI Services?

Couchbase AI Services provide:

  • LLM inference and embeddings API: Access popular LLMs (i.e., Llama 3) and embedding models directly through Capella, without managing external API keys or infrastructure.
  • Unified platform: Leverage the database, vectorization, search, and model in one place.
  • Integrated vector search: Perform semantic search directly on your JSON data with millisecond latency.

Setting Up Couchbase AI Services

Create a Cluster in Capella

  1. Log into Couchbase Capella.
  2. Create a new cluster or use an existing one. Note that the cluster needs to run the latest version of Couchbase Server 8.0 that includes the Data, Query, Index, and Eventing services.
  3. Create a bucket.
  4. Create a scope and collection for the data.

Enable AI Services

  1. Navigate to Capella’s AI Services section on the UI.
  2. Deploy the embeddings and LLM models.
    • You need to launch an embedding and an LLM for this demo in the same region as the Capella cluster where the data will be stored.
    • For this demo to work well, you need to deploy an LLM that has tool calling capabilities such as mistralai/mistral-7b-instruct-v0.3. For embeddings, you can choose a model like the nvidia/llama-3.2-nv-embedqa-1b-v2.
  3. Write down the endpoint URL and generate API keys.

For more details on launching AI models, you can read the official documentation.

Prerequisites

Before we begin, ensure you have Python 3.10+ installed.

Step 1: Install Dependencies

We need the Couchbase SDK, LangChain integrations, and the datasets library.

Step 2: Configuration & Connection

We’ll start by connecting to our Couchbase cluster. We also need to configure the endpoints for Capella Model Services.

Note: Capella Model Services are compatible with the OpenAI API format, so we can use the standard langchain-openai library by pointing it to our Capella endpoint.

Step 3: Set Up the Database Structure

We need to ensure our bucket, scope, and collection exist to store the news data.

Step 4: Loading Couchbase Vector Search Index

Semantic search requires an efficient way to retrieve relevant documents based on a user’s query. This is where Couchbase Vector Search, formerly known as Full-Text Search (FTS) service, comes into play. In this step, we load the Vector Search Index definition from a JSON file, which specifies how the index should be structured. This includes the fields to be indexed, the dimensions of the vectors, and other parameters that determine how the search engine processes queries based on vector similarity.

Step 5: Initialize AI Models

Here is the magic: we initialize the embedding model using OpenAIEmbeddings but point it to Capella. Couchbase AI Services provide OpenAI-compatible endpoints that are used by the agents. For embeddings, we’re using the LangChain OpenAI package as it is used in association with the LangChain Couchbase integration.

Step 6: Ingest Data

We load the BBC News dataset and ingest it into Couchbase. The CouchbaseSearchVectorStore automatically handles generating embeddings using our defined model and storing them.

Step 7: Build the RAG Chain

Now we create the RAG pipeline. We initialize the LLM (again pointing to Capella) and connect it to our vector store retriever.

Step 8: Run Queries

Let’s test our RAG.

Example Output:

Answer: Pep Guardiola has expressed concern and frustration about Manchester City’s recent form. He stated, “I am not good enough. I am the boss… I have to find solutions.” He acknowledged the team’s defensive issues and lack of confidence.

Conclusion

In this tutorial, you learned how to:

  1. Vectorize data using Couchbase.
  2. Use Couchbase AI Services for embeddings and LLM.
  3. Implement RAG with Couchbase Vector Search.

Couchbase’s unified database platform creates powerful AI applications that can generate high-quality, contextually-aware content.

Share this article
Get Couchbase blog updates in your inbox
This field is required.

Author

Posted by Laurent Doguin

Laurent is a nerdy metal head who lives in Paris. He mostly writes code in Java and structured text in AsciiDoc, and often talks about data, reactive programming and other buzzwordy stuff. He is also a former Developer Advocate for Clever Cloud and Nuxeo where he devoted his time and expertise to helping those communities grow bigger and stronger. He now runs Developer Relations at Couchbase.

Leave a comment

Ready to get Started with Couchbase Capella?

Start building

Check out our developer portal to explore NoSQL, browse resources, and get started with tutorials.

Use Capella free

Get hands-on with Couchbase in just a few clicks. Capella DBaaS is the easiest and fastest way to get started.

Get in touch

Want to learn more about Couchbase offerings? Let us help.