AI chatbots have become an essential tool for businesses and organizations. But most chatbot solutions depend on cloud-based models that introduce latency, API limitations, and perhaps most importantly, privacy concerns. What if you could run an AI chatbot entirely on your machine and still retain conversation history with a fully featured data platform?
In this post, we’ll walk through setting up a self-hosted AI chatbot using Docker Model Runner, a brand new feature from Docker that lets you run containerized models locally for inference and more, and Couchbase Capella to store, retrieve and search through conversations. The result is a fast, private and flexible chatbot that you control.
Ready to get started? Let’s get going!
Setting up Docker Model Runner
First, let’s ensure that your Docker Desktop and CLI version is up to date so that you have the Model Runner feature available to you. You do that by running docker model status
from your terminal. If it is successful, you will see a success message outputted that Docker Model Runner is running
. If you don’t, then you need to first get the latest version of Docker and install it. After you update Docker, you can run that command again and it should work.
Once you’ve done so, you will use Docker Model Runner to pull the container image with the Llama 3.2 model and make it available locally:
docker model pull ai/llama3.3
You can verify you successfully downloaded the Llama 3.2 model by running docker model list
and you will see the model available to you for use:
{"object":"list","data":[{"id":"ai/llama3.3","object":"model","created":1741794281,"owned_by":"docker"}]}
Want to test it out? Opening up the model in interactive mode is quite easy! Just run docker model run ai/llama3.3
from the command line and you will enter the interactive mode:
Interactive chat mode started. Type ‘/bye’ to exit.
>
Now that you have Llama 3.2 downloaded and ready to use, it’s time to build a straightforward backend application that leverages the model for a self-hosted AI chatbot.
Creating the chatbot
The application you will create will accomplish the following tasks:
-
- Run Llama 3.2 locally via the
docker model run
CLI command - Send user messages as prompts to the model
- Store chat history in Couchbase Capella
- Retrieve previous chats
- Run Llama 3.2 locally via the
Your application will be fully featured and ready to use immediately from your console as a robust AI powered chatbot. The code we build together here will provide you with the foundation to refactor it according to whatever needs you have. Perhaps, you wish to turn it into a backend for a web application. A few modifications will be all that is required to make that possible.
The application requires a couple of dependencies so from the project directory run npm install couchbase readline-sync
. We use the Couchbase Node.js SDK to interact with our Couchbase Capella data store, and we use readline-sync
to have the application interact with the user from the terminal.
Make sure you have set up a bucket in Capella to store the chat data, and have your Capella credentials at hand. As always, do not save your credentials in any version control. Use environment variables for local development to keep your credentials secure and not in public hands.
We are building four functions in our application:
-
askAI
to encapsulate the process of feeding the message to our locally run AI modelstoreChat
to send the chat history to CapellafetchChatHistory
to retrieve chat chatsmain
to act as the primary interface for the application
Let’s start with the main
function that’ll wrap all the rest. This function will create a loop that the user can exit at any time offering a continuous chat experience:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 |
async function main() { const { cluster, collection } = await connectToCouchbase(); console.log("Self-Hosted AI Chatbot (Llama 3.2 + Capella)"); console.log("Type your message below. Type 'history' to view past chats or 'exit' to quit.\n"); while (true) { const userMessage = readlineSync.question("> "); if (userMessage.toLowerCase() === "exit") { console.log("Goodbye!"); break; } if (userMessage.toLowerCase() === "history") { const history = await fetchChatHistory(cluster); console.log("\n📜 Chat History:"); history.forEach((chat) => { console.log(`🧑 ${chat.user}\n🤖 ${chat.response}\n`); }); continue; } console.log("🤖 Thinking..."); const aiResponse = await askAI(userMessage); console.log(`🤖 ${aiResponse}\n`); await storeChat(collection, userMessage, aiResponse); } } |
As you can see, we have introduced functionality built on top of Capella’s data store, namely the ability to retrieve within the chat itself the previous chat history. This could be useful for a user to regain their chat context whenever they start up a new session.
Now that we have the main
function, let’s create the supporting functions that it invokes, starting with the askAI
function:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
async function askAI(prompt) { return new Promise((resolve, reject) => { exec( `docker model run ai/llama3.3 "${prompt}"`, (error, stdout, stderr) => { if (error) { console.error(`Error running model: ${error.message}`); reject(error); } if (stderr) { console.error(`Docker stderr: ${stderr}`); } resolve(stdout.trim()); // Return AI response } ); }); } |
Next, the storeChat
function:
1 2 3 4 5 6 7 8 9 |
async function storeChat(collection, userMessage, aiResponse) { const chatDoc = { user: userMessage, response: aiResponse, timestamp: new Date().toISOString(), }; await collection.upsert(`chat_${Date.now()}`, chatDoc); } |
Finally, the fetchChatHistory
function:
1 2 3 4 5 6 7 8 9 |
async function fetchChatHistory(cluster, limit = 5) { const query = ` SELECT user, response, timestamp FROM `chatbot` ORDER BY timestamp DESC LIMIT ${limit}; `; const result = await cluster.query(query); return result.rows; } |
Once you have finished with the functions, make sure to add require
statements at the top of the file, and to create a connection to Couchbase Capella cluster:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
const { exec } = require("child_process"); const readlineSync = require("readline-sync"); const couchbase = require("couchbase"); require("dotenv").config(); async function connectToCouchbase() { try { const cluster = await couchbase.connect(COUCHBASE_URL, { username: COUCHBASE_USERNAME, password: COUCHBASE_PASSWORD, }); const bucket = cluster.bucket("chatbot"); const collection = bucket.defaultCollection(); console.log("Connected to Couchbase Capella"); return { cluster, collection }; } catch (err) { console.error("Failed to connect to Couchbase:", err); process.exit(1); } } |
Lastly, don’t forget to add a line at the end of your script that calls the main
function by inserting main();
on the last line.
Once finished, you have a fully ready AI chatbot hosted on your own machine retaining your privacy and simultaneously leveraging Capella for storage and retrieval.
Using your self-hosted AI chatbot
Your very own chatbot is ready to use! Every query you send it will be processed only locally on your machine using the Llama 3.2 model. Nothing will be sent to any AI provider remotely.
If you’re ready to give it a spin, do so by running the following:
node index.js # or whatever you named your file
Once you run it, you will see the following:
Connected to Couchbase Capella
Self-Hosted AI Chatbot (Llama 3.2 + Capella)
Type your message below. Type 'history' to view past chats or 'exit' to quit.
>
Go ahead and start asking questions and interacting with it. Here’s a brief example of what you can expect to see:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
Connected to Couchbase Capella 🚀 Self-Hosted AI Chatbot (Llama 3.2 + Capella) Type your message below. Type 'history' to view past chats or 'exit' to quit. > Should I pack a raincoat for Barcelona for a trip there at the end of March? Answer only with "yes" or "no". 🤖 Thinking... 🤖 Yes. > Tell me why I should pack a raincoat for Barcelona at the end of March. Limit your answer to 15 words or less. 🤖 Thinking... 🤖 You may need a raincoat as March weather in Barcelona can be unpredictable and rainy, up to 10C. > exit 👋 Goodbye! |
By using the scalability and security of Couchbase Capella along with Docker Model Runner’s privacy-first approach to running AI models locally, you can build dynamic AI applications that prioritize user privacy. Running models on your own machine also gives you faster inference, full control over prompt customization, the ability to store additional metadata, and the flexibility to fine-tune your chatbot’s behavior, all within your own environment.
This combination of Capella and Docker Model Runner gives you speed, control and the privacy needed to build AI applications without relying on external APIs. Whether you’re creating a chatbot, analyzing data or running AI-powered workflows, this setup ensures that anything you build will be efficient, scalable and fully under your control.
The only question is: What will you build?
-
- Connect with our developer community and show us what you are building!