Chapter 1 of 2

Get the tech stack up and running

Install Docker, pull the required images, and launch the full local stack with a single command.

What you'll build

By the end of this chapter you'll have all four services running locally inside Docker containers:

Ollama — local LLM inference
Qdrant — vector database for semantic search
Langflow — visual local pipeline builder
Postgres — relational store used by Langflow

Select your platform below — the rest of this guide will adapt automatically.

Select your platform — the guide will adapt

Step 1 — Install Docker

Linux (Ubuntu / Debian)

bash

# Remove any old versions
sudo apt-get remove -y docker docker-engine docker.io containerd runc
 
# Add Docker's GPG key and repo
sudo apt-get update
sudo apt-get install -y ca-certificates curl
sudo install -m 0755 -d /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg \
  | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
 
echo "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] \
  https://download.docker.com/linux/ubuntu \
  $(. /etc/os-release && echo "$VERSION_CODENAME") stable" \
  | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
 
# Install Docker Engine + Compose plugin
sudo apt-get update
sudo apt-get install -y docker-ce docker-ce-cli containerd.io \
  docker-buildx-plugin docker-compose-plugin
 
# Run Docker without sudo
sudo usermod -aG docker $USER
newgrp docker

Verify:

bash

docker --version
docker compose version

Step 2 — Create the project folder

bash

mkdir local-pipeline && cd local-pipeline

Step 3 — Write the `docker-compose.yml`

Linux supports both CPU-only and NVIDIA GPU acceleration. Choose the variant that matches your machine.

CPU only

yaml

services:
  # ── Ollama (CPU) ────────────────────────────────────
  ollama:
    image: ollama/ollama:latest
    container_name: ollama
    ports:
      - "11434:11434"
    volumes:
      - ollama_data:/root/.ollama
    restart: unless-stopped
 
  # ── Qdrant ─────────────────────────────────────────
  qdrant:
    image: qdrant/qdrant:latest
    container_name: qdrant
    ports:
      - "6333:6333"
      - "6334:6334"
    volumes:
      - qdrant_data:/qdrant/storage
    restart: unless-stopped
 
  # ── Postgres ────────────────────────────────────────
  postgres:
    image: postgres:16-alpine
    container_name: langflow_postgres
    environment:
      POSTGRES_USER: langflow
      POSTGRES_PASSWORD: langflow
      POSTGRES_DB: langflow
    volumes:
      - postgres_data:/var/lib/postgresql/data
    restart: unless-stopped
 
  # ── Langflow ────────────────────────────────────────
  langflow:
    image: langflowai/langflow:latest
    container_name: langflow
    ports:
      - "7860:7860"
    environment:
      LANGFLOW_DATABASE_URL: postgresql://langflow:langflow@postgres:5432/langflow
      LANGFLOW_SUPERUSER: admin
      LANGFLOW_SUPERUSER_PASSWORD: changeme
    depends_on:
      - postgres
    volumes:
      - langflow_data:/app/langflow
    restart: unless-stopped
 
volumes:
  ollama_data:
  qdrant_data:
  postgres_data:
  langflow_data:

NVIDIA GPU

First, install the NVIDIA Container Toolkit:

bash

curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey \
  | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
 
curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list \
  | sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' \
  | sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
 
sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker

Verify Docker can see your GPU:

bash

docker run --rm --gpus all nvidia/cuda:12.3.0-base-ubuntu22.04 nvidia-smi

Then use this docker-compose.yml — identical to the CPU version except for the deploy block on Ollama:

yaml

services:
  # ── Ollama (NVIDIA GPU) ─────────────────────────────
  ollama:
    image: ollama/ollama:latest
    container_name: ollama
    ports:
      - "11434:11434"
    volumes:
      - ollama_data:/root/.ollama
    restart: unless-stopped
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities: [gpu]
 
  # ── Qdrant ─────────────────────────────────────────
  qdrant:
    image: qdrant/qdrant:latest
    container_name: qdrant
    ports:
      - "6333:6333"
      - "6334:6334"
    volumes:
      - qdrant_data:/qdrant/storage
    restart: unless-stopped
 
  # ── Postgres ────────────────────────────────────────
  postgres:
    image: postgres:16-alpine
    container_name: langflow_postgres
    environment:
      POSTGRES_USER: langflow
      POSTGRES_PASSWORD: langflow
      POSTGRES_DB: langflow
    volumes:
      - postgres_data:/var/lib/postgresql/data
    restart: unless-stopped
 
  # ── Langflow ────────────────────────────────────────
  langflow:
    image: langflowai/langflow:latest
    container_name: langflow
    ports:
      - "7860:7860"
    environment:
      LANGFLOW_DATABASE_URL: postgresql://langflow:langflow@postgres:5432/langflow
      LANGFLOW_SUPERUSER: admin
      LANGFLOW_SUPERUSER_PASSWORD: changeme
    depends_on:
      - postgres
    volumes:
      - langflow_data:/app/langflow
    restart: unless-stopped
 
volumes:
  ollama_data:
  qdrant_data:
  postgres_data:
  langflow_data:

A 4 GB VRAM card (e.g. RTX 3050) handles llama3.2:3b. Use 8 GB+ for the llama3.2:8b model. With GPU, inference is roughly 10–20× faster than CPU.

Change LANGFLOW_SUPERUSER_PASSWORD before exposing Langflow outside localhost.

Step 4 — Start everything

bash

docker compose up -d

Docker pulls all images on first run (~3–4 GB). Subsequent starts are instant.

Check every container is running:

bash

docker compose ps

Expected output:

text

NAME                 IMAGE                         STATUS
langflow             langflowai/langflow:latest    Up
langflow_postgres    postgres:16-alpine            Up (healthy)
ollama               ollama/ollama:latest          Up
qdrant               qdrant/qdrant:latest          Up

Step 5 — Pull your first LLM

Ollama is running but has no models yet:

bash

docker exec -it ollama ollama pull llama3.2:3b

Verify:

bash

docker exec -it ollama ollama list

text

NAME              ID            SIZE    MODIFIED
llama3.2:3b       ...           2.0 GB  a few seconds ago

llama3.2:3b works on CPU with 8 GB RAM. With NVIDIA GPU enabled, try llama3.2:8b (requires 8 GB VRAM) for much better quality.

Step 6 — Verify all services

Open each URL in your browser:

| Service | URL | What you should see | |---|---|---| | Ollama API | http://localhost:11434 | Ollama is running | | Qdrant Dashboard | http://localhost:6333/dashboard | Collections UI | | Langflow | http://localhost:7860 | Login page |

Quick test — chat with your LLM

bash

curl http://localhost:11434/api/generate -d '{
  "model": "llama3.2:3b",
  "prompt": "In one sentence, what is a vector database?",
  "stream": false
}'

You'll get a JSON response with the model's answer in the response field.

Summary

You now have a complete local local stack running in Docker:

Ollama serving a local LLM on localhost:11434
Qdrant ready for vector embeddings on localhost:6333
Langflow's visual editor available on localhost:7860

In the next chapter, we'll log into Langflow and build your first local pipeline.

What you'll build

Step 1 — Install Docker

Linux (Ubuntu / Debian)

Step 2 — Create the project folder

Step 3 — Write the docker-compose.yml

CPU only

NVIDIA GPU

Step 4 — Start everything

Step 5 — Pull your first LLM

Step 6 — Verify all services

Quick test — chat with your LLM

Summary

Step 3 — Write the `docker-compose.yml`