AI Services

Migrate Your Existing Agents to Couchbase AI Services

A typical AI agent application in 2025 usually involves:

  • A cloud-hosted LLM
  • A vector database for retrieval
  • A separate operational database
  • Prompt management and tool management tools
  • Observability and tracing frameworks
  • Guardrails

Each tool solves a problem. Collectively, however, they can potentially create architectural sprawl with unpredictable latency, rising operational costs, and governance blind spots. As a result, a lot of AI agents never move beyond demos or internal prototypes because the complexity escalates too fast.

This post walks through how we migrated an existing AI agent application to Couchbase AI Services and the Agent Catalog, moving to a single production-ready AI platform. 

The Core Problem: Fragmentation Kills Production AI

It’s important to understand why agentic systems struggle in production. Most AI agents today are built from too many loosely coupled parts: prompts live in one system, vectors in another, conversations are logged inconsistently, tools are invoked without clear traceability – making agent behavior difficult to debug. At the same time, sending enterprise data to third-party LLM endpoints introduces compliance and security risks. Finally, governance is usually treated as an afterthought; many frameworks emphasize what an agent can do, but fail to explain why it made a decision, which prompt or tool influenced it, or whether that decision should have been allowed at all. This is an unacceptable gap for real business workflows.

What Are Couchbase AI Services?

Building AI applications often involves juggling multiple services: a vector database for memory, an inference provider for LLMs (like OpenAI or Anthropic), and separate infrastructure for embedding models.

Couchbase AI Services streamlines this by providing a unified platform where your operational data, vector search, and AI models live together. It offers:

  • LLM inference and embeddings API: Access popular LLMs (like Llama 3) and embedding models directly within Couchbase Capella, with no external API keys, no extra infrastructure, and no data egress. Your application data stays inside Capella. Queries, vectors, and model inference all happen where the data lives. This enables secure, low-latency AI experiences while meeting privacy and compliance requirements. The key value: data and AI together, with sensitive information kept inside your system.
  • Unified platform: Maintain your database, vectorization, search, and model in a central location.
  • Integrated Vector Search: Perform semantic search directly on your JSON data with millisecond latency.

Why Is This Needed?

As we move from simple chatbots to agentic workflows – where AI models autonomously use tools – latency and setup complexity become major bottlenecks. Couchbase AI Services takes a platform-first approach. By co-locating your data and AI services, it reduces operational overhead and latency. In addition, tools like the Agent Catalog help manage hundreds of agent prompts and tools, while providing built-in logging and telemetry for agents. 

At this point, the question shifts from why a platform-first approach matters to how it works in practice.

So let’s explore how you can migrate an existing agentic application, and improve its performance, governance, and reliability along the way.

What the Current App Looks Like

The current application is an HR Sourcing Agent designed to automate the initial screening of candidates. The main job of the agent application is to ingest raw resume files (PDFs), understand the content of the resumes using an LLM, and structure the unstructured data into a queryable format enriched with semantic embeddings in Couchbase. It allows HR professionals to upload a new job description and get results for the best-suited candidates using Couchbase vector search. 

In its current state, the HR Sourcing App is a Python-based microservice that wraps an LLM with the Google ADK. It manually wires together model definitions, agent prompts, and execution pipelines. While functional, the architecture requires the developer to manage session state in memory, handle retry logic, clean raw model outputs, and maintain the integration between the LLM and the database manually. Also, there is no built-in telemetry for our agent. 

​​The app manually instantiates a model provider. In this specific case, it connects to a hosted open source model (Qwen 2.5-72B via Nebius) using the LiteLLM wrapper. The app has to manually spin up a runtime environment for the agent. It initializes an InMemorySessionService to track the state of the conversation (even if short-lived) and a Runner to execute the user’s input (the resume text) against the agent pipeline.

Migrating the Agent Application to Couchbase AI Services

Now let’s dive into how to migrate the core logic of our agent to use Couchbase AI Services and the Agent Catalog. 

The new agent uses a LangChain ReAct agent to process job descriptions, it performs intelligent candidate matching using vector search and provides ranked candidate recommendations with explanations. 

Prerequisites

Before we begin, ensure you have:

  • Python 3.10+ installed.

Install Dependencies

We’ll start by installing the necessary packages. This includes the agentc CLI for the catalog and the LangChain integration packages.

Centralized Model Service (Couchbase AI Model Services Integration)

In the original adk_resume_agent.py, we had to manually instantiate LiteLLM, manage specific provider API keys (Nebius, OpenAI, etc.), and handle the connection logic inside our application code. We will migrate the code to use Couchbase. 

Couchbase AI Services provides OpenAI-compatible endpoints that are used by the agents. For the LLM and embeddings, we use the LangChain OpenAI package, which integrates directly with the LangChain Couchbase connector.

Enable AI Services

  1. Navigate to Capella’s AI Services section on the UI.
  2. Deploy the Embeddings and LLM models.
    • You need to launch an embedding and an LLM for this demo in the same region as the Capella cluster where the data will be stored.
    • Deploy an LLM that has tool calling capabilities such as mistralai/mistral-7b-instruct-v0.3. For embeddings, you can choose a model like the nvidia/llama-3.2-nv-embedqa-1b-v2.
  3. Note the endpoint URL and generate API keys.

For more details on launching AI models, you can check the official documentation.

Implementing the Code Logic for LLM and Embedding Models

We need to configure the endpoints for Capella Model Services. Capella Model Services are compatible with the OpenAI API format, so we can use the standard langchain-openai library by pointing it to our Capella endpoint. We initialize the embedding model with OpenAIEmbeddings and the LLM with ChatOpenAI, but point it to Capella.

 

Instead of hardcoding model providers, the agent now connects to a unified Capella endpoint, which acts as an API gateway for both the LLM and the embedding model. 

Decoupling Prompts and Tools With Agent Catalog

The Agent Catalog is a powerful tool for managing the lifecycle of your agent’s capabilities. Instead of hardcoding prompts and tool definitions in your Python files, you manage them as versioned assets. You can centralize and reuse your tools across your development teams. You can also examine and monitor agent responses with the Agent Tracer. These features provide visibility, control, and traceability for agent development and deployment. Your teams can build agents with confidence, knowing they can be audited and managed effectively.  

Without the ability to back-trace agent behavior, it becomes impossible to automate the ongoing trust, validation, and corroboration of the autonomous decisions made by agents. In the Agent Catalog, this is performed by evaluating both the agentic code and its conversation transcript with its LLM to assess the appropriateness of its pending decision or MCP tool lookup.  

So let’s incorporate Agent Catalog in the project. 

Adding the Vector Search Tool 

We will start by adding our tool definition for the Agent Catalog. In this case we have the vector search tool. 

To add a new Python function as a tool for your agent, you can use the Agent Catalog command-line tool’s add command:

agentc add 

If you have an existing Python tool that you want to add to the Agent Catalog, add agentc to your imports and the @agentc.catalog.tool decorator to your tool definition. In our example, we define a Python function for performing vector search as our tool. 

Adding the Prompts 

In the original architecture, the agent’s instructions were buried inside the Python code as large string variables, making them difficult to version or update without a full deployment. With the Agent Catalog, we now define our “HR Recruiter” persona as a standalone, managed asset using prompts. Using a structured YAML definition (record_kind: prompt), we create the hr_recruiter_assistant. This definition doesn’t just hold the text; it encapsulates the entire behavior of the agent, strictly defining the ReAct pattern (Thought → Action → Observation) that guides the LLM to use the vector search tool effectively.

Index and Publishing the Local Files

We use agentc to index our local files and publish them to Couchbase. This stores the metadata in the database, making it searchable and discoverable by the agent at runtime.

In our code, we initialize the Catalog and use catalog.find() to retrieve verified prompts and tools. We no longer hardcode prompts; instead, we fetch them.

Standardized Reasoning Engine (LangChain Integration)

The previous app used a custom SequentialAgent pipeline. While flexible, it meant we had to maintain our own execution loops, error handling, and retry logic for the agent’s reasoning steps.

By leveraging the Agent Catalog’s compatibility with LangChain, we switched to a standard ReAct (Reason + Act) agent architecture. We simply feed the tools and prompts fetched from the catalog directly into create_react_agent.

What’s the benefit? We get industry-standard reasoning loops – Thought -> Action -> Observation – out of the box. The agent can now autonomously decide to search for “React Developers,” analyze the results, and then perform a second search for “Frontend Engineers” if the first yields few results. something the linear ADK pipeline struggled with.

Built-in Observability (Agent Tracing)

In the previous agent application, observability was limited to print() statements. There was no way to “replay” an agent’s session to understand why it rejected a specific candidate.

Agent Catalog provides tracing. It allows users to use SQL++ with traces, leverage the performance of Couchbase, and get insight into details of prompts and tools in the same platform.

We can add Transactional Observability using catalog.Span(). We wrap the execution logic in a context manager that logs every thought, action, and result back to Couchbase. We can now view a full “trace” of the recruitment session in the Capella UI, showing exactly how the LLM processed a candidate’s resume. 

Conclusion

AI agents fail in production not because LLMs lack capability, but because agentic systems can become too complex. By adopting a platform-first approach with Couchbase AI Services and the Agent Catalog, we transformed a complex agent into a governed, scalable agentic system. 

If you’re building AI agents today, the real question isn’t which LLM to use – it’s how you’ll run agents safely, observably, and at scale. Couchbase AI Services are built for exactly that.

Share this article
Get Couchbase blog updates in your inbox
This field is required.

Author

Posted by Laurent Doguin

Laurent is a nerdy metal head who lives in Paris. He mostly writes code in Java and structured text in AsciiDoc, and often talks about data, reactive programming and other buzzwordy stuff. He is also a former Developer Advocate for Clever Cloud and Nuxeo where he devoted his time and expertise to helping those communities grow bigger and stronger. He now runs Developer Relations at Couchbase.

Leave a comment

Ready to get Started with Couchbase Capella?

Start building

Check out our developer portal to explore NoSQL, browse resources, and get started with tutorials.

Use Capella free

Get hands-on with Couchbase in just a few clicks. Capella DBaaS is the easiest and fastest way to get started.

Get in touch

Want to learn more about Couchbase offerings? Let us help.