AI Stack Setup in 30 Minuten: Ollama + Open WebUI + Docker | AI Engineering Wiki

📋 Auf einen Blick

In diesem Tutorial setzt du in 30 Minuten einen lokalen AI-Stack auf: Ollama als LLM-Backend, Open WebUI als Chat-Interface, Docker als Container-Runtime. Am Ende hast du einen voll funktionsfähigen ChatGPT-Klon der auf deiner eigenen Hardware läuft.

Voraussetzungen

Was	Minimum	Empfohlen
GPU	8 GB VRAM (7B Modelle)	24 GB VRAM / RTX 3090 (bis 34B Modelle)
RAM	16 GB	32 GB
Festplatte	50 GB frei	200 GB NVMe SSD
Betriebssystem	Windows 10, macOS, Ubuntu 22.04+	Ubuntu 24.04 LTS
Docker	Docker Desktop (Win/Mac) oder Docker Engine (Linux)	Docker Engine + NVIDIA Container Toolkit

ℹ️ Keine GPU? Geht auch

Ollama läuft auch auf der CPU — nur deutlich langsamer. Ein 7B Modell auf einer modernen CPU (i7/Ryzen 7) liefert ca. 5-10 tok/s. Zum Testen reicht das, für produktive Nutzung brauchst du eine GPU.

Schritt 1: Ollama installieren (5 Minuten)

Ollama-Installation: Ein Befehl auf Linux, Installer auf Windows/Mac.

Linux / macOS

# Ein-Befehl Installation
curl -fsSL https://ollama.com/install.sh | sh

# Prüfen ob es läuft
ollama --version

Windows

# Download von https://ollama.com/download
# Installer ausführen
# Ollama läuft als Hintergrund-Service

Schritt 2: Erstes Modell herunterladen (5-10 Minuten)

ollama pull lädt das Modell herunter und speichert es lokal.

Modell herunterladen und testen

# Empfehlung für den Einstieg: Llama 3.3 (8B)
ollama pull llama3.3

# Direkt testen
ollama run llama3.3

# Installierte Modelle anzeigen
ollama list

Modell	Größe	VRAM	Stärke	Befehl
Llama 3.3 (8B)	4.7 GB	~5 GB	Schneller Allrounder	ollama pull llama3.3
Mistral Small 3.1 (24B)	14 GB	~16 GB	Starkes Deutsch, übertrifft GPT-4o Mini	ollama pull mistral-small3.1
Qwen3 14B	9 GB	~10 GB	Gutes Reasoning, 100+ Sprachen	ollama pull qwen3:14b
DeepSeek R1 14B	9 GB	~10 GB	Starkes Chain-of-Thought Reasoning	ollama pull deepseek-r1:14b

⚠️ VRAM prüfen

Wenn das Modell nicht in den VRAM passt, lagert Ollama auf die CPU aus — das ist 5-10x langsamer. Prüfen: nvidia-smi (Linux/Windows) zeigt den freien VRAM. 24 GB GPU: maximal 34B Modelle in Q4 Quantisierung. 70B passt NICHT auf 24 GB.

ollama run startet eine interaktive Chat-Session im Terminal.

Schritt 3: Open WebUI starten (5 Minuten)

Terminal-Chat ist OK zum Testen, aber für den Alltag willst du ein Web-Interface. Open WebUI sieht aus wie ChatGPT, läuft aber lokal und verbindet sich mit deinem Ollama.

Docker Compose (empfohlen)

# docker-compose.yml erstellen
cat > docker-compose.yml << 'EOF'
services:
  open-webui:
    image: ghcr.io/open-webui/open-webui:main
    ports:
      - "3000:8080"
    environment:
      - OLLAMA_BASE_URL=http://host.docker.internal:11434
    volumes:
      - open-webui-data:/app/backend/data
    extra_hosts:
      - "host.docker.internal:host-gateway"
    restart: unless-stopped

volumes:
  open-webui-data:
EOF

# Starten
docker compose up -d

# Browser öffnen: http://localhost:3000

ℹ️ Linux: host-gateway

Auf Linux braucht der Container extra_hosts: host.docker.internal:host-gateway um auf den Ollama-Service auf dem Host zuzugreifen. Auf Windows und Mac ist host.docker.internal automatisch verfügbar.

Browser öffnen

Navigiere zu http://localhost:3000

Account erstellen

Der erste User wird automatisch Admin. E-Mail und Passwort frei wählbar — alles bleibt lokal.

Modell auswählen und chatten

Open WebUI erkennt automatisch alle Modelle die in Ollama installiert sind. Oben das Modell wählen und loschatten.

Schritt 4: Verifikation (5 Minuten)

Prüfe ob alles korrekt läuft:

# Ollama API erreichbar?
curl http://localhost:11434/api/tags
# Erwartete Antwort: JSON mit deinen Modellen

# Open WebUI läuft?
curl -I http://localhost:3000
# Erwartete Antwort: HTTP 200

# GPU wird genutzt?
nvidia-smi
# "ollama" sollte unter Processes auftauchen

# Docker Container Status
docker compose ps
# open-webui sollte "Up" sein

💡 Ollama REST API

Ollama bietet eine vollständige REST API unter Port 11434. Du kannst sie aus jedem Programm, Script oder Workflow ansprechen: curl http://localhost:11434/api/chat -d {"model":"llama3.3","messages":[{"role":"user","content":"Hallo"}]}. Perfekt für Integration mit n8n, Python-Scripts oder eigene Tools.

Nächste Schritte

Was	Warum	Wiki-Artikel
Docker Grundlagen lernen	Verstehen was unter der Haube passiert	/tools/docker-grundlagen
Mehrere Modelle testen	Verschiedene Stärken für verschiedene Aufgaben	/tools/model-selection
n8n Workflows bauen	LLM in automatisierte Prozesse einbinden	/tools/n8n-für-anfänger
Monitoring einrichten	GPU-Auslastung und Container-Health überwachen	/tools/grafana-monitoring
Security prüfen	Lokal heisst nicht automatisch sicher	/security/self-hosted-sicherheit

💡 Komplett-Anleitung

Dieses Tutorial deckt den Schnellstart ab. Für eine umfassende Anleitung mit Hardware-Empfehlungen, Network Setup, Backup-Strategie und Produktionshärtung findest du in den Artikeln Self-Hosted Sicherheit und Backup-Strategie.

ℹ️ Siehe auch

n8n bietet ein offizielles Self-hosted AI Starter Kit — ein guter Ausgangspunkt wenn du nur n8n + Ollama brauchst.

Das Wichtigste

✓30 Minuten: Ollama installieren (5 min), Modell laden (10 min), Open WebUI starten (5 min), verifizieren (5 min).
✓Minimum: 8 GB VRAM für 7B Modelle. Empfohlen: RTX 3090 (24 GB) für bis zu 34B Modelle.
✓Open WebUI ist ein vollwertiges ChatGPT-Interface — lokal, ohne Cloud, ohne monatliche Kosten.
✓Ollama REST API (Port 11434) erlaubt Integration in Scripts, n8n Workflows und eigene Anwendungen.
✓Keine GPU? Geht auch auf CPU, nur langsamer (~5-10 tok/s statt 40-112 tok/s).

Quellen

Ollama — Lokale LLM-Runtime
Open WebUI — Self-hosted ChatGPT-Interface
Docker Compose Documentation — Container-Orchestrierung
LocalAIMaster: Best GPUs for AI — GPU Inference Benchmarks (tok/s)
n8n Self-hosted AI Starter Kit — Offizielles Starter Kit für lokale AI mit n8n, Ollama und Qdrant
n8n AI Tutorial — Offizielles Tutorial: AI-Workflow mit n8n bauen
n8n AI Starter Kit Dokumentation — Offizielle Dokumentation zum Self-hosted AI Starter Kit
DataCamp: Lokale KI mit Docker, n8n, Qdrant und Ollama — Schritt-für-Schritt Tutorial für lokale AI-Infrastruktur