Zum Inhalt springen
>_<
AI EngineeringWiki

Tools

AI Stack Setup in 30 Minutes

Ollama + Open WebUI + Docker: Your own ChatGPT clone, running locally on your hardware. No cloud, no API keys, no monthly costs.

Reading time: 10 minLast updated: March 2026
πŸ“‹ At a Glance

In this tutorial, you set up a local AI stack in 30 minutes: Ollama as LLM backend, Open WebUI as chat interface, Docker as container runtime. At the end, you have a fully functional ChatGPT clone running on your own hardware.

Prerequisites

WhatMinimumRecommended
GPU8 GB VRAM (7B models)24 GB VRAM / RTX 3090 (up to 34B models)
RAM16 GB32 GB
Storage50 GB free200 GB NVMe SSD
OSWindows 10, macOS, Ubuntu 22.04+Ubuntu 24.04 LTS
DockerDocker Desktop (Win/Mac) or Docker Engine (Linux)Docker Engine + NVIDIA Container Toolkit
ℹ️ No GPU? Still works

Ollama also runs on CPU β€” just significantly slower. A 7B model on a modern CPU (i7/Ryzen 7) delivers about 5-10 tok/s. Fine for testing, but you need a GPU for productive use.

Step 1: Install Ollama (5 Minutes)

Ollama Installation

Ollama installation: One command on Linux, installer on Windows/Mac.

Linux / macOS

# One-command installation
curl -fsSL https://ollama.com/install.sh | sh

# Verify it works
ollama --version

Windows

# Download from https://ollama.com/download
# Run installer
# Ollama runs as background service

Step 2: Download Your First Model (5-10 Minutes)

Ollama Pull: Download a model

ollama pull downloads the model and stores it locally.

Download and test a model

# Recommended starter: Llama 3.3 (8B)
ollama pull llama3.3

# Test it directly
ollama run llama3.3

# Show installed models
ollama list
ModelSizeVRAMStrengthCommand
Llama 3.3 (8B)4.7 GB~5 GBFast all-rounderollama pull llama3.3
Mistral Small 3.1 (24B)14 GB~16 GBStrong German, beats GPT-4o Miniollama pull mistral-small3.1
Qwen3 14B9 GB~10 GBGood reasoning, 100+ languagesollama pull qwen3:14b
DeepSeek R1 14B9 GB~10 GBStrong chain-of-thought reasoningollama pull deepseek-r1:14b
⚠️ Check VRAM

If the model does not fit in VRAM, Ollama falls back to CPU β€” 5-10x slower. Check: nvidia-smi (Linux/Windows) shows free VRAM. 24 GB GPU: max 34B models in Q4 quantization. 70B does NOT fit on 24 GB.

Ollama Run: Chat with local LLM

ollama run starts an interactive chat session in the terminal.

Step 3: Start Open WebUI (5 Minutes)

Terminal chat works for testing, but for daily use you want a web interface. Open WebUI looks like ChatGPT but runs locally and connects to your Ollama instance.

Docker Compose (recommended)

# Create docker-compose.yml
cat > docker-compose.yml << 'EOF'
services:
  open-webui:
    image: ghcr.io/open-webui/open-webui:main
    ports:
      - "3000:8080"
    environment:
      - OLLAMA_BASE_URL=http://host.docker.internal:11434
    volumes:
      - open-webui-data:/app/backend/data
    extra_hosts:
      - "host.docker.internal:host-gateway"
    restart: unless-stopped

volumes:
  open-webui-data:
EOF

# Start
docker compose up -d

# Open browser: http://localhost:3000
ℹ️ Linux: host-gateway

On Linux the container needs extra_hosts: host.docker.internal:host-gateway to access the Ollama service on the host. On Windows and Mac, host.docker.internal is available automatically.

1

Open browser

Navigate to http://localhost:3000

2

Create account

The first user automatically becomes admin. Email and password are freely chosen β€” everything stays local.

3

Select model and start chatting

Open WebUI automatically detects all models installed in Ollama. Select the model at the top and start chatting.

Step 4: Verification (5 Minutes)

Check that everything is running correctly:

# Ollama API reachable?
curl http://localhost:11434/api/tags
# Expected: JSON with your models

# Open WebUI running?
curl -I http://localhost:3000
# Expected: HTTP 200

# GPU being used?
nvidia-smi
# "ollama" should appear under Processes

# Docker container status
docker compose ps
# open-webui should show "Up"
πŸ’‘ Ollama REST API

Ollama provides a full REST API on port 11434. You can call it from any program, script, or workflow: curl http://localhost:11434/api/chat -d {"model":"llama3.3","messages":[{"role":"user","content":"Hello"}]}. Perfect for integration with n8n, Python scripts, or custom tools.

Next Steps

WhatWhyWiki Article
Learn Docker basicsUnderstand what happens under the hood/en/tools/docker-grundlagen
Test multiple modelsDifferent strengths for different tasks/en/tools/model-selection
Build n8n workflowsIntegrate LLM into automated processes/en/tools/n8n-fΓΌr-anfaenger
Set up monitoringMonitor GPU utilization and container health/en/tools/grafana-monitoring
Check securityLocal does not automatically mean secure/en/security/self-hosted-sicherheit
πŸ’‘ Complete Guide

This tutorial covers the quick start. For a comprehensive guide with hardware recommendations, network setup, backup strategy, and production hardening β€” our The Local AI Stack Playbook walks you through the entire process.

Das Wichtigste

  • βœ“30 minutes: Install Ollama (5 min), download model (10 min), start Open WebUI (5 min), verify (5 min).
  • βœ“Minimum: 8 GB VRAM for 7B models. Recommended: RTX 3090 (24 GB) for up to 34B models.
  • βœ“Open WebUI is a full-featured ChatGPT interface β€” local, no cloud, no monthly costs.
  • βœ“Ollama REST API (port 11434) enables integration into scripts, n8n workflows, and custom applications.
  • βœ“No GPU? Works on CPU too, just slower (~5-10 tok/s instead of 40-112 tok/s).

Sources

War dieser Artikel hilfreich?

Next step: ship workflows that stay operable

Use proven n8n patterns, templates and integrations for workflows that stay local, documented, and auditable.

Why AI Engineering
  • Local and self-hosted by default
  • Documented and auditable
  • Built from our own runtime
  • Made in Austria
Not legal advice.