CodeLab: Building a RAG Application With Couchbase Capella Model Services and LangChain

In this tutorial, you will learn how to build a retrieval-augmented generation (RAG) application using Couchbase AI Services to store data, generate embedding using embedding models, and LLM inference. We will create a RAG system that:

Ingests news articles from the BBC News dataset.
Generates vector embeddings using the NVIDIA NeMo Retriever model via Capella Model Services.
Stores and indexes these vectors in Couchbase Capella.
Performs semantic search to retrieve relevant context.
Generates answers using the Mistral-7B LLM hosted on Capella.

You can find the notebook source code for this CodeLab here.

Why Couchbase AI Services?

Couchbase AI Services provide:

LLM inference and embeddings API: Access popular LLMs (i.e., Llama 3) and embedding models directly through Capella, without managing external API keys or infrastructure.
Unified platform: Leverage the database, vectorization, search, and model in one place.
Integrated vector search: Perform semantic search directly on your JSON data with millisecond latency.

Setting Up Couchbase AI Services

Create a Cluster in Capella

Log into Couchbase Capella.
Create a new cluster or use an existing one. Note that the cluster needs to run the latest version of Couchbase Server 8.0 that includes the Data, Query, Index, and Eventing services.
Create a bucket.
Create a scope and collection for the data.

Enable AI Services

Navigate to Capella’s AI Services section on the UI.
Deploy the embeddings and LLM models.

You need to launch an embedding and an LLM for this demo in the same region as the Capella cluster where the data will be stored.
For this demo to work well, you need to deploy an LLM that has tool calling capabilities such as mistralai/mistral-7b-instruct-v0.3. For embeddings, you can choose a model like the nvidia/llama-3.2-nv-embedqa-1b-v2.

Write down the endpoint URL and generate API keys.

For more details on launching AI models, you can read the official documentation.

Prerequisites

Before we begin, ensure you have Python 3.10+ installed.

Step 1: Install Dependencies

We need the Couchbase SDK, LangChain integrations, and the datasets library.

%pip install --quiet datasets==4.4.1 langchain-couchbase==1.0.0 langchain-openai==1.1.0

1	%pip install --quiet datasets==4.4.1 langchain-couchbase==1.0.0 langchain-openai==1.1.0

Step 2: Configuration & Connection

We’ll start by connecting to our Couchbase cluster. We also need to configure the endpoints for Capella Model Services.

Note: Capella Model Services are compatible with the OpenAI API format, so we can use the standard langchain-openai library by pointing it to our Capella endpoint.

import getpass
from couchbase.auth import PasswordAuthenticator
from couchbase.cluster import Cluster
from couchbase.options import ClusterOptions
from datetime import timedelta

# Configuration
CB_CONNECTION_STRING = getpass.getpass("Couchbase Connection String: ")
CB_USERNAME = input("Database Username: ")
CB_PASSWORD = getpass.getpass("Database Password: ")
CB_BUCKET_NAME = input("Bucket Name: ")
SCOPE_NAME = "rag"
COLLECTION_NAME = "data"
INDEX_NAME = "vs-index"

# Model Services Config
CAPELLA_MODEL_SERVICES_ENDPOINT = getpass.getpass("Capella Model Services Endpoint: ")
LLM_MODEL_NAME = "mistralai/mistral-7b-instruct-v0.3"
LLM_API_KEY = getpass.getpass("LLM API Key: ")
EMBEDDING_MODEL_NAME = "nvidia/llama-3.2-nv-embedqa-1b-v2"
EMBEDDING_API_KEY = getpass.getpass("Embedding API Key: ")

# Connect to Cluster
auth = PasswordAuthenticator(CB_USERNAME, CB_PASSWORD)
cluster = Cluster(CB_CONNECTION_STRING, ClusterOptions(auth))
cluster.wait_until_ready(timedelta(seconds=5))
print("Successfully connected to Couchbase")

import getpass

from couchbase.auth import PasswordAuthenticator

from couchbase.cluster import Cluster

from couchbase.options import ClusterOptions

from datetime import timedelta

# Configuration

CB_CONNECTION_STRING = getpass.getpass("Couchbase Connection String: ")

CB_USERNAME = input("Database Username: ")

CB_PASSWORD = getpass.getpass("Database Password: ")

CB_BUCKET_NAME = input("Bucket Name: ")

SCOPE_NAME = "rag"

COLLECTION_NAME = "data"

INDEX_NAME = "vs-index"

# Model Services Config

CAPELLA_MODEL_SERVICES_ENDPOINT = getpass.getpass("Capella Model Services Endpoint: ")

LLM_MODEL_NAME = "mistralai/mistral-7b-instruct-v0.3"

LLM_API_KEY = getpass.getpass("LLM API Key: ")

EMBEDDING_MODEL_NAME = "nvidia/llama-3.2-nv-embedqa-1b-v2"

EMBEDDING_API_KEY = getpass.getpass("Embedding API Key: ")

# Connect to Cluster

auth = PasswordAuthenticator(CB_USERNAME, CB_PASSWORD)

cluster = Cluster(CB_CONNECTION_STRING, ClusterOptions(auth))

cluster.wait_until_ready(timedelta(seconds=5))

print("Successfully connected to Couchbase")

Step 3: Set Up the Database Structure

We need to ensure our bucket, scope, and collection exist to store the news data.

def setup_collection(cluster, bucket_name, scope_name, collection_name):
    bucket = cluster.bucket(bucket_name)
    manager = bucket.collections()
    
    # Create Scope
    if scope_name not in [s.name for s in manager.get_all_scopes()]:
        manager.create_scope(scope_name)
        
    # Create Collection
    bucket_manager = bucket.collections()
    scopes = bucket_manager.get_all_scopes()
    # ... (logic to create collection if missing) ...
    
    # Create Primary Index
    cluster.query(f"CREATE PRIMARY INDEX IF NOT EXISTS ON `{bucket_name}`.`{scope_name}`.`{collection_name}`").execute()

setup_collection(cluster, CB_BUCKET_NAME, SCOPE_NAME, COLLECTION_NAME)

def setup_collection(cluster, bucket_name, scope_name, collection_name):

bucket = cluster.bucket(bucket_name)

manager = bucket.collections()

# Create Scope

if scope_name not in [s.name for s in manager.get_all_scopes()]:

manager.create_scope(scope_name)

# Create Collection

bucket_manager = bucket.collections()

scopes = bucket_manager.get_all_scopes()

# ... (logic to create collection if missing) ...

# Create Primary Index

cluster.query(f"CREATE PRIMARY INDEX IF NOT EXISTS ON `{bucket_name}`.`{scope_name}`.`{collection_name}`").execute()

setup_collection(cluster, CB_BUCKET_NAME, SCOPE_NAME, COLLECTION_NAME)

Step 4: Loading Couchbase Vector Search Index

Semantic search requires an efficient way to retrieve relevant documents based on a user’s query. This is where Couchbase Vector Search, formerly known as Full-Text Search (FTS) service, comes into play. In this step, we load the Vector Search Index definition from a JSON file, which specifies how the index should be structured. This includes the fields to be indexed, the dimensions of the vectors, and other parameters that determine how the search engine processes queries based on vector similarity.

# If you are running this script in Google Colab, comment the following line
# and provide the path to your index definition file.

index_definition_path = "capella_index.json"  # Local setup: specify your file path here

# If you are running in Google Colab, use the following code to upload the index definition file
# from google.colab import files
# print("Upload your index definition file")
# uploaded = files.upload()
# index_definition_path = list(uploaded.keys())[0]

try:
    with open(index_definition_path, "r") as file:
        index_definition = json.load(file)

        # Update search index definition with user inputs
        index_definition['name'] = INDEX_NAME
        index_definition['sourceName'] = CB_BUCKET_NAME
        # Update types mapping
        old_type_key = next(iter(index_definition['params']['mapping']['types'].keys()))
        type_obj = index_definition['params']['mapping']['types'].pop(old_type_key)
        index_definition['params']['mapping']['types'][f"{SCOPE_NAME}.{COLLECTION_NAME}"] = type_obj
        
except Exception as e:
    raise ValueError(
        f"Error loading index definition from {index_definition_path}: {str(e)}"
    )

# Create the Vector Index via SDK
try:
    scope_index_manager = (
        cluster.bucket(CB_BUCKET_NAME).scope(SCOPE_NAME).search_indexes()
    )

    # Check if index already exists
    existing_indexes = scope_index_manager.get_all_indexes()
    index_name = index_definition["name"]

    if index_name in [index.name for index in existing_indexes]:
        print(f"Index '{index_name}' found")
    else:
        print(f"Creating new index '{index_name}'...")

    # Create SearchIndex object from JSON definition
    search_index = SearchIndex.from_json(index_definition)

    # Upsert the index (create if not exists, update if exists)
    scope_index_manager.upsert_index(search_index)
    print(f"Index '{index_name}' successfully created/updated.")

except Exception as e:
    logging.error(f"Error creating or updating index: {e}")

# If you are running this script in Google Colab, comment the following line

# and provide the path to your index definition file.

index_definition_path = "capella_index.json" # Local setup: specify your file path here

# If you are running in Google Colab, use the following code to upload the index definition file

# from google.colab import files

# print("Upload your index definition file")

# uploaded = files.upload()

# index_definition_path = list(uploaded.keys())[0]

try:

with open(index_definition_path, "r") as file:

index_definition = json.load(file)

# Update search index definition with user inputs

index_definition['name'] = INDEX_NAME

index_definition['sourceName'] = CB_BUCKET_NAME

# Update types mapping

old_type_key = next(iter(index_definition['params']['mapping']['types'].keys()))

type_obj = index_definition['params']['mapping']['types'].pop(old_type_key)

index_definition['params']['mapping']['types'][f"{SCOPE_NAME}.{COLLECTION_NAME}"] = type_obj

except Exception as e:

raise ValueError(

f"Error loading index definition from {index_definition_path}: {str(e)}"

)

# Create the Vector Index via SDK

try:

scope_index_manager = (

cluster.bucket(CB_BUCKET_NAME).scope(SCOPE_NAME).search_indexes()

)

# Check if index already exists

existing_indexes = scope_index_manager.get_all_indexes()

index_name = index_definition["name"]

if index_name in [index.name for index in existing_indexes]:

print(f"Index '{index_name}' found")

else:

print(f"Creating new index '{index_name}'...")

# Create SearchIndex object from JSON definition

search_index = SearchIndex.from_json(index_definition)

# Upsert the index (create if not exists, update if exists)

scope_index_manager.upsert_index(search_index)

print(f"Index '{index_name}' successfully created/updated.")

except Exception as e:

logging.error(f"Error creating or updating index: {e}")

Step 5: Initialize AI Models

Here is the magic: we initialize the embedding model using OpenAIEmbeddings but point it to Capella. Couchbase AI Services provide OpenAI-compatible endpoints that are used by the agents. For embeddings, we’re using the LangChain OpenAI package as it is used in association with the LangChain Couchbase integration.

from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(
    openai_api_key=EMBEDDING_API_KEY,
    openai_api_base=CAPELLA_MODEL_SERVICES_ENDPOINT, # Capella Endpoint
    model=EMBEDDING_MODEL_NAME,
    check_embedding_ctx_length=False,
    tiktoken_enabled=False
)

from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(

openai_api_key=EMBEDDING_API_KEY,

openai_api_base=CAPELLA_MODEL_SERVICES_ENDPOINT, # Capella Endpoint

model=EMBEDDING_MODEL_NAME,

check_embedding_ctx_length=False,

tiktoken_enabled=False

)

Step 6: Ingest Data

We load the BBC News dataset and ingest it into Couchbase. The CouchbaseSearchVectorStore automatically handles generating embeddings using our defined model and storing them.

from datasets import load_dataset
from langchain_core.documents import Document
from langchain_couchbase.vectorstores import CouchbaseSearchVectorStore

# Load Data
dataset = load_dataset('RealTimeData/bbc_news_alltime', '2024-12', split="train")
unique_articles = list(set(dataset["content"]))[:100] # Limit for demo

# Initialize Vector Store
vector_store = CouchbaseSearchVectorStore(
    cluster=cluster,
    bucket_name=CB_BUCKET_NAME,
    scope_name=SCOPE_NAME,
    collection_name=COLLECTION_NAME,
    embedding=embeddings,
    index_name=INDEX_NAME,
)

# Ingest
documents = [Document(page_content=article) for article in unique_articles]
vector_store.add_documents(documents)
print(f"Ingested {len(documents)} documents")

from datasets import load_dataset

from langchain_core.documents import Document

from langchain_couchbase.vectorstores import CouchbaseSearchVectorStore

# Load Data

dataset = load_dataset('RealTimeData/bbc_news_alltime', '2024-12', split="train")

unique_articles = list(set(dataset["content"]))[:100] # Limit for demo

# Initialize Vector Store

vector_store = CouchbaseSearchVectorStore(

cluster=cluster,

bucket_name=CB_BUCKET_NAME,

scope_name=SCOPE_NAME,

collection_name=COLLECTION_NAME,

embedding=embeddings,

index_name=INDEX_NAME,

)

# Ingest

documents = [Document(page_content=article) for article in unique_articles]

vector_store.add_documents(documents)

print(f"Ingested {len(documents)} documents")

Step 7: Build the RAG Chain

Now we create the RAG pipeline. We initialize the LLM (again pointing to Capella) and connect it to our vector store retriever.

from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser

# 1. Initialize LLM
llm = ChatOpenAI(
    openai_api_base=CAPELLA_MODEL_SERVICES_ENDPOINT,
    openai_api_key=LLM_API_KEY,
    model=LLM_MODEL_NAME,
    temperature=0
)

# 2. Define Prompt
template = """Answer the question based only on the following context:
{context}

Question: {question}
"""
prompt = ChatPromptTemplate.from_template(template)

# 3. Create Chain
rag_chain = (
    {"context": vector_store.as_retriever(), "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

from langchain_openai import ChatOpenAI

from langchain_core.prompts import ChatPromptTemplate

from langchain_core.runnables import RunnablePassthrough

from langchain_core.output_parsers import StrOutputParser

# 1. Initialize LLM

llm = ChatOpenAI(

openai_api_base=CAPELLA_MODEL_SERVICES_ENDPOINT,

openai_api_key=LLM_API_KEY,

model=LLM_MODEL_NAME,

temperature=0

)

# 2. Define Prompt

template = """Answer the question based only on the following context:

{context}

Question: {question}

"""

prompt = ChatPromptTemplate.from_template(template)

# 3. Create Chain

rag_chain = (

{"context": vector_store.as_retriever(), "question": RunnablePassthrough()}

| prompt

| llm

| StrOutputParser()

)

Step 8: Run Queries

Let’s test our RAG.

query = "What was Pep Guardiola's reaction to Manchester City's recent form?"
response = rag_chain.invoke(query)

print(f"Question: {query}")
print(f"Answer: {response}")

query = "What was Pep Guardiola's reaction to Manchester City's recent form?"

response = rag_chain.invoke(query)

print(f"Question: {query}")

print(f"Answer: {response}")

Example Output:

Answer: Pep Guardiola has expressed concern and frustration about Manchester City’s recent form. He stated, “I am not good enough. I am the boss… I have to find solutions.” He acknowledged the team’s defensive issues and lack of confidence.

Conclusion

In this tutorial, you learned how to:

Vectorize data using Couchbase.
Use Couchbase AI Services for embeddings and LLM.
Implement RAG with Couchbase Vector Search.

Couchbase’s unified database platform creates powerful AI applications that can generate high-quality, contextually-aware content.

Laurent Doguin

Share this article

Platform

Self-Managed

Services

Capabilities

Why Couchbase?

Migrate to Capella

By Use Case

By Industry

By Application Need

Popular Docs

By Developer Role

Quickstart

Resource Center

About

Partnerships

Our Services

Partners: Register a Deal

Ready to register a deal with Couchbase?

Marriott

CodeLab: Building a RAG Application With Couchbase Capella Model Services and LangChain

Why Couchbase AI Services?

Setting Up Couchbase AI Services

Create a Cluster in Capella

Enable AI Services

Prerequisites

Step 1: Install Dependencies

Step 2: Configuration & Connection

Step 3: Set Up the Database Structure

Step 4: Loading Couchbase Vector Search Index

Step 5: Initialize AI Models

Step 6: Ingest Data

Step 7: Build the RAG Chain

Step 8: Run Queries

Conclusion

Get Couchbase blog updates in your inbox

Author

Posted by Laurent Doguin

Leave a comment Cancel reply

Ready to get Started with Couchbase Capella?

Start building

Use Capella free

Get in touch