Build Once, Run Everywhere: Docker for Your RAG Chatbot

Vasileios iosifidis
May 29
4 min read

Updated: Jun 17

In the previous post, I’ve built a RAG system that lets me chat with my videos and audio files like they’re my knowledge assistants. Whisper takes care of the transcription, contextual chunking makes everything sound smarter than it is, QWEN LLM combines the fetched information to answer queries, and the Streamlit UI ties it all together in one simple app. So far, so good, but here’s the thing.

What happens when you want to run this on another machine? Or deploy it somewhere? Or—God forbid—come back to it in six months and realize half your Python packages are broken, Whisper throws errors, and your LLM refuses to cooperate?

Yeah, that’s where Docker comes in.

📈 Discover how real businesses use AI to create value. Join the newsletter for practical use cases and strategic insights.

Subscribe to newsletter!

Docker is a software wrapper for your project. It packages everything — your code, your dependencies, your weird local setup — into a container that just works. You can run it anywhere. No “missing ffmpeg” errors. No “this only works on my laptop” excuses. Just clean, repeatable execution, production-friendly I would say.

In this follow-up post, I’ll show you how to wrap the entire RAG system I built in the last post into a Docker container and run it as a self-contained app. And yes, I’ll go step-by-step, with a few helpful code snippets sprinkled along the way. You’ll end up with a RAG system that runs with a single command and feels way more professional (and stable).

Project Structure

Before starting to throw commands into the terminal, let’s pause for a second and look at what exactly is needed from the project's structure. Here’s a simplified version of my project layout:

Depending on how modular you like your projects, this might be split into even more files. But the general idea is: keep the core logic inside the app/ folder, keep the config clean, and don’t let the main script become a spaghetti monster.

In this structure, I have already loaded the RAG indices, which were generated in the previous post; therefore, I only need to load the indices and make the LLM queries through the Streamlit app.

main_app.py launches the Streamlit UI.
coordinator.py is responsible for processing the queries, triggering the retrieval (fetching the relevant information), and the LLM querying (through Ollama API calls)
indices folder contains the chunked information, which was extracted from the file corpus in the previous post
requirements.txt contains all the necessary libraries for the project to be executable
Dockerfile contains the bash commands to Dockerize the project as an app

That’s what I am wrapping into a Docker container.

Writing the Dockerfile

Alright, now that my project structure is in place, it’s time to write the real MVP of this post: the Dockerfile.

Think of it as a recipe for your environment. Everything your RAG system needs — Python version, vector db, similarity matching algorithms, LLM querying, Streamlit goes in here.

Let’s break it down:

# 1. Use an official Python base image
FROM python:3.10-slim

# 2. Set working directory
WORKDIR /app

# 3. Copy requirements and install dependencies
COPY app/requirements.txt .
RUN pip install --upgrade pip && pip install --no-cache-dir -r requirements.txt

# 4. Copy rest of the app (including indices)
COPY app/ .

# 5. Expose Streamlit port
EXPOSE 8501
EXPOSE 11434

# 6. Set environment variable
ENV STREAMLIT_SERVER_PORT=8501

# 7. Command to run Streamlit app
CMD ["streamlit", "run", "main_app.py"]

Let’s talk about a few key lines here:

WORKDIR /app and COPY . /app make sure your code actually ends up in the container (you’d be surprised how easy it is to forget this)
exporting the ports is essential for interaction with the hosting environment (local pc in my case)
The last command line boots up the Streamlit interface

By the end of this step, you’ll have a fully portable image that bundles everything from your transcription pipeline to your LLM magic. No setup scripts, no fiddling with paths, just: build it once and run it anywhere.

Running the Container

You’ve got your Dockerfile. You’ve got your code. Now it’s time to breathe life into this thing. Navigate to your project root and run:

docker build -t rag-as-a-service .

That’s it. Docker will read the Dockerfile, grab the base image, install dependencies, copy your code, and bundle everything into a neat image called rag-chat. If it’s your first time running this, go grab a coffee. After that, it’ll cache most steps and finish in seconds.

Now let’s run it:

docker run -p 8501:8501 rag-as-a-service

This maps port 8501 inside the container (where Streamlit runs) to port 8501 on your machine. Once it starts up, you’ll see something like:

The other port, which I exposed (11434), is for the Ollama service, which runs locally and allows the Docker container to interact with it.

Now I can access it through the local network, directly from my phone!

Final Thoughts: From One-Liner to Real-World Service

And just like that, your RAG system is now... a containerized, ready-to-ship product. You’ve taken something that used to live only in your dev environment — cobbled together with virtualenvs, weird dependency trees, and “don’t touch anything” vibes — and turned it into a self-contained, portable, production-ready service.

Whether you want to run it locally, deploy it on a VPS, or hand it off to a teammate without needing to explain how ffmpeg works, Docker’s got your back. You can now chat with your own videos and audio files — anywhere, anytime, and in a setup that works for you and/or your users.

📥 Want practical AI use cases? Subscribe to stay informed.

Subscribe to newsletter!