In this tutorial, you will learn how to build a retrieval-augmented generation (RAG) application using Couchbase AI Services to store data, generate embedding using embedding models, and LLM inference. We will create a RAG system that:
- Ingests news articles from the BBC News dataset.
- Generates vector embeddings using the NVIDIA NeMo Retriever model via Capella Model Services.
- Stores and indexes these vectors in Couchbase Capella.
- Performs semantic search to retrieve relevant context.
- Generates answers using the Mistral-7B LLM hosted on Capella.
You can find the notebook source code for this CodeLab here.
Why Couchbase AI Services?
Couchbase AI Services provide:
- LLM inference and embeddings API: Access popular LLMs (i.e., Llama 3) and embedding models directly through Capella, without managing external API keys or infrastructure.
- Unified platform: Leverage the database, vectorization, search, and model in one place.
- Integrated vector search: Perform semantic search directly on your JSON data with millisecond latency.
Setting Up Couchbase AI Services
Create a Cluster in Capella
- Log into Couchbase Capella.
- Create a new cluster or use an existing one. Note that the cluster needs to run the latest version of Couchbase Server 8.0 that includes the Data, Query, Index, and Eventing services.
- Create a bucket.
- Create a scope and collection for the data.
Enable AI Services
- Navigate to Capella’s AI Services section on the UI.
- Deploy the embeddings and LLM models.
- You need to launch an embedding and an LLM for this demo in the same region as the Capella cluster where the data will be stored.
- For this demo to work well, you need to deploy an LLM that has tool calling capabilities such as
mistralai/mistral-7b-instruct-v0.3. For embeddings, you can choose a model like thenvidia/llama-3.2-nv-embedqa-1b-v2. - Write down the endpoint URL and generate API keys.
For more details on launching AI models, you can read the official documentation.
Prerequisites
Before we begin, ensure you have Python 3.10+ installed.
Step 1: Install Dependencies
We need the Couchbase SDK, LangChain integrations, and the datasets library.
|
1 |
%pip install --quiet datasets==4.4.1 langchain-couchbase==1.0.0 langchain-openai==1.1.0 |
Step 2: Configuration & Connection
We’ll start by connecting to our Couchbase cluster. We also need to configure the endpoints for Capella Model Services.
Note: Capella Model Services are compatible with the OpenAI API format, so we can use the standard langchain-openai library by pointing it to our Capella endpoint.
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 |
import getpass from couchbase.auth import PasswordAuthenticator from couchbase.cluster import Cluster from couchbase.options import ClusterOptions from datetime import timedelta # Configuration CB_CONNECTION_STRING = getpass.getpass("Couchbase Connection String: ") CB_USERNAME = input("Database Username: ") CB_PASSWORD = getpass.getpass("Database Password: ") CB_BUCKET_NAME = input("Bucket Name: ") SCOPE_NAME = "rag" COLLECTION_NAME = "data" INDEX_NAME = "vs-index" # Model Services Config CAPELLA_MODEL_SERVICES_ENDPOINT = getpass.getpass("Capella Model Services Endpoint: ") LLM_MODEL_NAME = "mistralai/mistral-7b-instruct-v0.3" LLM_API_KEY = getpass.getpass("LLM API Key: ") EMBEDDING_MODEL_NAME = "nvidia/llama-3.2-nv-embedqa-1b-v2" EMBEDDING_API_KEY = getpass.getpass("Embedding API Key: ") # Connect to Cluster auth = PasswordAuthenticator(CB_USERNAME, CB_PASSWORD) cluster = Cluster(CB_CONNECTION_STRING, ClusterOptions(auth)) cluster.wait_until_ready(timedelta(seconds=5)) print("Successfully connected to Couchbase") |
Step 3: Set Up the Database Structure
We need to ensure our bucket, scope, and collection exist to store the news data.
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
def setup_collection(cluster, bucket_name, scope_name, collection_name): bucket = cluster.bucket(bucket_name) manager = bucket.collections() # Create Scope if scope_name not in [s.name for s in manager.get_all_scopes()]: manager.create_scope(scope_name) # Create Collection bucket_manager = bucket.collections() scopes = bucket_manager.get_all_scopes() # ... (logic to create collection if missing) ... # Create Primary Index cluster.query(f"CREATE PRIMARY INDEX IF NOT EXISTS ON `{bucket_name}`.`{scope_name}`.`{collection_name}`").execute() setup_collection(cluster, CB_BUCKET_NAME, SCOPE_NAME, COLLECTION_NAME) |
Step 4: Loading Couchbase Vector Search Index
Semantic search requires an efficient way to retrieve relevant documents based on a user’s query. This is where Couchbase Vector Search, formerly known as Full-Text Search (FTS) service, comes into play. In this step, we load the Vector Search Index definition from a JSON file, which specifies how the index should be structured. This includes the fields to be indexed, the dimensions of the vectors, and other parameters that determine how the search engine processes queries based on vector similarity.
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 |
# If you are running this script in Google Colab, comment the following line # and provide the path to your index definition file. index_definition_path = "capella_index.json" # Local setup: specify your file path here # If you are running in Google Colab, use the following code to upload the index definition file # from google.colab import files # print("Upload your index definition file") # uploaded = files.upload() # index_definition_path = list(uploaded.keys())[0] try: with open(index_definition_path, "r") as file: index_definition = json.load(file) # Update search index definition with user inputs index_definition['name'] = INDEX_NAME index_definition['sourceName'] = CB_BUCKET_NAME # Update types mapping old_type_key = next(iter(index_definition['params']['mapping']['types'].keys())) type_obj = index_definition['params']['mapping']['types'].pop(old_type_key) index_definition['params']['mapping']['types'][f"{SCOPE_NAME}.{COLLECTION_NAME}"] = type_obj except Exception as e: raise ValueError( f"Error loading index definition from {index_definition_path}: {str(e)}" ) # Create the Vector Index via SDK try: scope_index_manager = ( cluster.bucket(CB_BUCKET_NAME).scope(SCOPE_NAME).search_indexes() ) # Check if index already exists existing_indexes = scope_index_manager.get_all_indexes() index_name = index_definition["name"] if index_name in [index.name for index in existing_indexes]: print(f"Index '{index_name}' found") else: print(f"Creating new index '{index_name}'...") # Create SearchIndex object from JSON definition search_index = SearchIndex.from_json(index_definition) # Upsert the index (create if not exists, update if exists) scope_index_manager.upsert_index(search_index) print(f"Index '{index_name}' successfully created/updated.") except Exception as e: logging.error(f"Error creating or updating index: {e}") |
Step 5: Initialize AI Models
Here is the magic: we initialize the embedding model using OpenAIEmbeddings but point it to Capella. Couchbase AI Services provide OpenAI-compatible endpoints that are used by the agents. For embeddings, we’re using the LangChain OpenAI package as it is used in association with the LangChain Couchbase integration.
|
1 2 3 4 5 6 7 8 9 |
from langchain_openai import OpenAIEmbeddings embeddings = OpenAIEmbeddings( openai_api_key=EMBEDDING_API_KEY, openai_api_base=CAPELLA_MODEL_SERVICES_ENDPOINT, # Capella Endpoint model=EMBEDDING_MODEL_NAME, check_embedding_ctx_length=False, tiktoken_enabled=False ) |
Step 6: Ingest Data
We load the BBC News dataset and ingest it into Couchbase. The CouchbaseSearchVectorStore automatically handles generating embeddings using our defined model and storing them.
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
from datasets import load_dataset from langchain_core.documents import Document from langchain_couchbase.vectorstores import CouchbaseSearchVectorStore # Load Data dataset = load_dataset('RealTimeData/bbc_news_alltime', '2024-12', split="train") unique_articles = list(set(dataset["content"]))[:100] # Limit for demo # Initialize Vector Store vector_store = CouchbaseSearchVectorStore( cluster=cluster, bucket_name=CB_BUCKET_NAME, scope_name=SCOPE_NAME, collection_name=COLLECTION_NAME, embedding=embeddings, index_name=INDEX_NAME, ) # Ingest documents = [Document(page_content=article) for article in unique_articles] vector_store.add_documents(documents) print(f"Ingested {len(documents)} documents") |
Step 7: Build the RAG Chain
Now we create the RAG pipeline. We initialize the LLM (again pointing to Capella) and connect it to our vector store retriever.
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 |
from langchain_openai import ChatOpenAI from langchain_core.prompts import ChatPromptTemplate from langchain_core.runnables import RunnablePassthrough from langchain_core.output_parsers import StrOutputParser # 1. Initialize LLM llm = ChatOpenAI( openai_api_base=CAPELLA_MODEL_SERVICES_ENDPOINT, openai_api_key=LLM_API_KEY, model=LLM_MODEL_NAME, temperature=0 ) # 2. Define Prompt template = """Answer the question based only on the following context: {context} Question: {question} """ prompt = ChatPromptTemplate.from_template(template) # 3. Create Chain rag_chain = ( {"context": vector_store.as_retriever(), "question": RunnablePassthrough()} | prompt | llm | StrOutputParser() ) |
Step 8: Run Queries
Let’s test our RAG.
|
1 2 3 4 5 |
query = "What was Pep Guardiola's reaction to Manchester City's recent form?" response = rag_chain.invoke(query) print(f"Question: {query}") print(f"Answer: {response}") |
Example Output:
Answer: Pep Guardiola has expressed concern and frustration about Manchester City’s recent form. He stated, “I am not good enough. I am the boss… I have to find solutions.” He acknowledged the team’s defensive issues and lack of confidence.
Conclusion
In this tutorial, you learned how to:
- Vectorize data using Couchbase.
- Use Couchbase AI Services for embeddings and LLM.
- Implement RAG with Couchbase Vector Search.
Couchbase’s unified database platform creates powerful AI applications that can generate high-quality, contextually-aware content.