Self-Hosted AI Agent: Complete Setup with Docker & Monitoring (2026)

Q: Do I need a GPU to self-host AI agents with Docker?

No. The AI agent framework (OpenClaw) acts as an orchestrator. It sends prompts to cloud AI providers like Anthropic Claude or OpenAI and handles the responses. The actual model inference happens on their servers. Your Docker host only needs enough CPU and RAM to run the gateway, manage sessions, and execute integrations. A dual-core machine with 2 GB RAM is sufficient for most setups.

Q: How do I update my Dockerized AI agents without downtime?

Use a rolling update strategy with docker compose. Run 'docker compose pull' to fetch the latest images, then 'docker compose up -d' to recreate only the containers that changed. Docker Compose handles the transition gracefully. If you need zero-downtime deploys, run two instances behind a reverse proxy (like Traefik or Caddy) and drain one at a time. For most homelab setups, the 2-3 second restart window is perfectly acceptable.

Q: What happens to agent conversations when Docker containers restart?

Conversations persist across restarts because session data is stored on a Docker volume mounted to the host filesystem. The docker-compose.yml in this guide maps ~/openclaw-data to the container's data directory. When a container restarts (manually or via health check failure), it picks up all existing sessions exactly where they left off. No messages are lost.

Q: Can I run self-hosted AI agents on a NAS like Synology or Unraid?

Yes. Any device that supports Docker can run OpenClaw agents. Synology NAS models with Docker support (DS220+, DS920+, and newer) work well. Unraid has native Docker support and makes volume management straightforward. TrueNAS Scale also supports Docker containers. The docker-compose.yml from this guide works on all of these platforms without modification.

Q: How much bandwidth do self-hosted AI agents use?

Very little. Each agent message is a small JSON payload sent to the AI provider API (typically 1-10 KB per request). Even a busy agent handling 500 messages per day uses less than 50 MB of bandwidth monthly. The monitoring stack (Prometheus + Grafana) adds minimal overhead since it only scrapes local metrics. You do not need a high-speed connection. A stable 5 Mbps link is more than enough.

Why Self-Host Your AI Agents

Managed AI agent platforms are convenient until they are not. They read your prompts, throttle your usage, change pricing without warning, and shut down without notice. Self-hosting puts you back in control.

When you self-host, your agent configs, conversation history, and API keys never leave your network. You decide when to update, how many agents to run, and which AI provider to use. There is no monthly platform fee. The only ongoing costs are your hardware electricity and the AI API calls your agents make.

100%

data privacy

platform fees

24/7

uptime with Docker

<5 min

deploy time

Architecture Overview

A self-hosted AI agent stack has three layers: the agent server that runs your agents, the Docker runtime that isolates and manages them, and the monitoring stack that watches everything.

┌─────────────────────────────────────────────────┐
│                  Your Network                    │
│                                                  │
│  ┌──────────────────────────────────────────┐   │
│  │           Docker Compose Stack            │   │
│  │                                           │   │
│  │  ┌─────────────┐  ┌─────────────────┐   │   │
│  │  │  OpenClaw    │  │   Prometheus    │   │   │
│  │  │  Gateway     │  │   (metrics)     │   │   │
│  │  │  :18789      │  │   :9090         │   │   │
│  │  └──────┬───────┘  └────────┬────────┘   │   │
│  │         │                    │             │   │
│  │  ┌──────┴───────┐  ┌───────┴─────────┐   │   │
│  │  │  Agent       │  │   Grafana       │   │   │
│  │  │  Workspaces  │  │   (dashboards)  │   │   │
│  │  │  (volumes)   │  │   :3000         │   │   │
│  │  └──────────────┘  └─────────────────┘   │   │
│  │                                           │   │
│  │  ┌─────────────────────────────────┐     │   │
│  │  │  Watchdog (health + alerting)   │     │   │
│  │  └─────────────────────────────────┘     │   │
│  └──────────────────────────────────────────┘   │
│                                                  │
│  Telegram / Slack / Discord  ←──→  AI APIs       │
└─────────────────────────────────────────────────┘

OpenClaw Gateway is the core. It runs your agents, manages sessions, routes messages to integrations (Telegram, Slack, Discord), and handles API calls to AI providers.

Prometheus scrapes metrics from the gateway every 15 seconds: active sessions, messages processed, API latency, error rates, and container resource usage.

Grafana visualizes those metrics in real-time dashboards. You can see agent activity, response times, and system health at a glance.

Watchdog is a lightweight script that pings the gateway health endpoint every 60 seconds and sends you an alert (via Telegram or email) if something goes wrong.

Docker Setup: Dockerfile and docker-compose.yml

Docker gives you reproducible deployments, easy rollbacks, and process isolation. One command brings up your entire agent stack. One command tears it down. Here is the complete setup.

Dockerfile

FROM node:22-slim

# Install OpenClaw globally
RUN npm install -g openclaw

# Create app directory
WORKDIR /app

# Copy agent configurations
COPY agents/ /app/agents/

# Copy entrypoint script
COPY entrypoint.sh /app/entrypoint.sh
RUN chmod +x /app/entrypoint.sh

# Expose gateway port
EXPOSE 18789

# Health check endpoint
HEALTHCHECK --interval=30s --timeout=5s --retries=3 \
  CMD curl -f http://localhost:18789/health || exit 1

ENTRYPOINT ["/app/entrypoint.sh"]

Entrypoint Script

#!/bin/bash
# entrypoint.sh

# Initialize OpenClaw if first run
if [ ! -f /data/.initialized ]; then
  openclaw init
  touch /data/.initialized
fi

# Register agents from /app/agents/
for agent_dir in /app/agents/*/; do
  agent_name=$(basename "$agent_dir")
  if [ -f "$agent_dir/SOUL.md" ]; then
    openclaw agents add "$agent_name" \
      --workspace "$agent_dir" --non-interactive 2>/dev/null || true
  fi
done

# Configure AI provider from environment variable
if [ -n "$ANTHROPIC_API_KEY" ]; then
  openclaw models auth paste-token \
    --provider anthropic --token "$ANTHROPIC_API_KEY"
fi

# Start the gateway
exec openclaw gateway start

docker-compose.yml (Full Stack)

version: "3.8"

services:
  openclaw:
    build: .
    container_name: openclaw-gateway
    restart: unless-stopped
    ports:
      - "18789:18789"
    volumes:
      - openclaw-data:/data
      - ./agents:/app/agents:ro
    environment:
      - NODE_ENV=production
      - ANTHROPIC_API_KEY=${ANTHROPIC_API_KEY}
      - TELEGRAM_BOT_TOKEN=${TELEGRAM_BOT_TOKEN}
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:18789/health"]
      interval: 30s
      timeout: 5s
      retries: 3
      start_period: 15s
    logging:
      driver: json-file
      options:
        max-size: "10m"
        max-file: "3"

  prometheus:
    image: prom/prometheus:latest
    container_name: openclaw-prometheus
    restart: unless-stopped
    ports:
      - "9090:9090"
    volumes:
      - ./monitoring/prometheus.yml:/etc/prometheus/prometheus.yml:ro
      - prometheus-data:/prometheus
    command:
      - "--config.file=/etc/prometheus/prometheus.yml"
      - "--storage.tsdb.retention.time=30d"

  grafana:
    image: grafana/grafana:latest
    container_name: openclaw-grafana
    restart: unless-stopped
    ports:
      - "3000:3000"
    volumes:
      - grafana-data:/var/lib/grafana
    environment:
      - GF_SECURITY_ADMIN_PASSWORD=${GRAFANA_PASSWORD:-changeme}
      - GF_USERS_ALLOW_SIGN_UP=false

  watchdog:
    build:
      context: .
      dockerfile: Dockerfile.watchdog
    container_name: openclaw-watchdog
    restart: unless-stopped
    environment:
      - HEALTH_URL=http://openclaw:18789/health
      - CHECK_INTERVAL=60
      - ALERT_TELEGRAM_TOKEN=${TELEGRAM_BOT_TOKEN}
      - ALERT_CHAT_ID=${ALERT_CHAT_ID}
    depends_on:
      openclaw:
        condition: service_healthy

volumes:
  openclaw-data:
  prometheus-data:
  grafana-data:

Environment File (.env)

# .env - keep this file out of version control
ANTHROPIC_API_KEY=sk-ant-xxxxxxxxxxxxxxxxxxxxx
TELEGRAM_BOT_TOKEN=123456:ABC-DEF-your-bot-token
ALERT_CHAT_ID=your-telegram-chat-id
GRAFANA_PASSWORD=your-secure-password

Important: Never commit your .env file to git. Add it to .gitignore. The docker-compose.yml references these variables with the ${VAR} syntax, so Docker reads them automatically from the .env file in the same directory.

Bring It Up

# Build and start everything
docker compose up -d --build

# Check container status
docker compose ps

# View gateway logs
docker compose logs -f openclaw

# Verify health endpoint
curl http://localhost:18789/health

Monitoring and Watchdog Setup

Self-hosted means self-monitored. You need to know when your agents go down before your users do. This section covers the Prometheus config, a standalone watchdog script, and Telegram alerting.

Prometheus Configuration

# monitoring/prometheus.yml
global:
  scrape_interval: 15s
  evaluation_interval: 15s

scrape_configs:
  - job_name: "openclaw"
    static_configs:
      - targets: ["openclaw:18789"]
    metrics_path: /metrics
    scrape_interval: 15s

  - job_name: "node-exporter"
    static_configs:
      - targets: ["host.docker.internal:9100"]

Watchdog Script

The watchdog runs as its own container. It pings the gateway health endpoint every 60 seconds. If the health check fails three times in a row, it sends a Telegram alert. You can also run this script directly on the host without Docker.

#!/bin/bash
# watchdog.sh - Agent health monitor with Telegram alerting

HEALTH_URL="${HEALTH_URL:-http://localhost:18789/health}"
CHECK_INTERVAL="${CHECK_INTERVAL:-60}"
FAIL_THRESHOLD=3
TELEGRAM_TOKEN="${ALERT_TELEGRAM_TOKEN}"
CHAT_ID="${ALERT_CHAT_ID}"

fail_count=0
last_alert_time=0
ALERT_COOLDOWN=300  # 5 minutes between repeated alerts

send_alert() {
  local message="$1"
  if [ -n "$TELEGRAM_TOKEN" ] && [ -n "$CHAT_ID" ]; then
    curl -s -X POST \
      "https://api.telegram.org/bot${TELEGRAM_TOKEN}/sendMessage" \
      -d "chat_id=${CHAT_ID}" \
      -d "text=${message}" \
      -d "parse_mode=HTML" > /dev/null
  fi
  echo "[ALERT] $message"
}

echo "[watchdog] Monitoring $HEALTH_URL every ${CHECK_INTERVAL}s"

while true; do
  http_code=$(curl -s -o /dev/null -w "%{http_code}" \
    --max-time 5 "$HEALTH_URL" 2>/dev/null)

  if [ "$http_code" = "200" ]; then
    if [ $fail_count -ge $FAIL_THRESHOLD ]; then
      send_alert "OpenClaw gateway recovered. Status: OK"
    fi
    fail_count=0
  else
    fail_count=$((fail_count + 1))
    echo "[watchdog] Health check failed ($http_code) - count $fail_count/$FAIL_THRESHOLD"

    if [ $fail_count -ge $FAIL_THRESHOLD ]; then
      now=$(date +%s)
      diff=$((now - last_alert_time))
      if [ $diff -ge $ALERT_COOLDOWN ]; then
        send_alert "OpenClaw gateway is DOWN. Host: $(hostname). Status: $http_code. Failed checks: $fail_count. Time: $(date -u '+%Y-%m-%d %H:%M UTC')"
        last_alert_time=$now
      fi
    fi
  fi

  sleep "$CHECK_INTERVAL"
done

Watchdog Dockerfile

# Dockerfile.watchdog
FROM alpine:3.19
RUN apk add --no-cache bash curl
COPY watchdog.sh /watchdog.sh
RUN chmod +x /watchdog.sh
CMD ["/watchdog.sh"]

What the monitoring stack gives you

Docker health checks restart the gateway container automatically if the process hangs or crashes. No manual intervention needed.

Prometheus metrics track messages per minute, API response times, error rates, and container CPU/memory usage over time.

Grafana dashboards let you visualize trends and set alert thresholds directly in the UI (e.g., alert if error rate exceeds 5%).

Telegram watchdog sends you a push notification on your phone the moment something fails. Five-minute cooldown prevents alert spam.

Hardware Comparison: Mac Mini vs VPS vs Raspberry Pi

The best hardware depends on how many agents you plan to run, your budget, and whether you want physical control over the machine. All three options work with the same Docker setup.

Feature	Raspberry Pi 5	Cloud VPS	Mac Mini (M-series)
Upfront cost	$60	$0	$599+
Monthly cost	~$4 (power)	$6-24/month	~$8 (power)
Agents supported	3-5	5-20 (depends on plan)	20-50+
RAM	8 GB	2-8 GB	16-24 GB
Docker support	Yes (ARM64)	Yes (x86_64)	Yes (ARM64)
Data privacy	Full (local)	Depends on provider	Full (local)
Power draw	5-8W	N/A	10-40W
Best for	Budget homelab	Quick start / remote	Power users / teams

For most solo developers and hobbyists, a Raspberry Pi 5 is the sweet spot. It runs 3-5 agents comfortably, uses almost no electricity, and fits in a desk drawer. If you are running a larger team of agents or need local model inference, the Mac Mini is worth the investment. A cloud VPS makes sense when you need geographic redundancy or do not want to manage physical hardware.

Security: Network Isolation, API Keys, and Permissions

Self-hosting does not mean self-securing happens automatically. Follow these practices to keep your agent stack locked down.

Use Docker networks for isolation. By default, docker-compose creates an internal network. Only expose the ports you actually need. The gateway port (18789) can stay internal if you only access agents through Telegram or Slack. Prometheus and Grafana should be behind a reverse proxy with authentication if exposed to the internet.

Store API keys in .env, never in images. The docker-compose.yml references environment variables from your .env file. Never bake API keys into Dockerfiles or commit them to git. For extra security, use Docker secrets or a vault solution like HashiCorp Vault.

Run containers as non-root. Add a USER node directive to your Dockerfile after the npm install step. This prevents a container breakout from having root access to your host system.

Set agent permission boundaries. In each agent’s SOUL.md, explicitly define what the agent can and cannot do. If an agent should only read data, say so. If it should never execute shell commands, enforce it. The SOUL.md acts as both a personality config and a permission boundary.

Enable automatic security updates. Keep your Docker images updated. Use watchtower to automatically pull and redeploy updated base images on a schedule. For the host OS, enable unattended-upgrades (Debian/Ubuntu) or configure automatic updates for your platform.

# Add Watchtower for automatic image updates
# Append to your docker-compose.yml services:

  watchtower:
    image: containrrr/watchtower
    container_name: watchtower
    restart: unless-stopped
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock
    environment:
      - WATCHTOWER_CLEANUP=true
      - WATCHTOWER_SCHEDULE=0 0 4 * * *  # Check daily at 4 AM
      - WATCHTOWER_NOTIFICATIONS=shoutrrr
      - WATCHTOWER_NOTIFICATION_URL=telegram://${TELEGRAM_BOT_TOKEN}@telegram?channels=${ALERT_CHAT_ID}

Skip the Config Writing: Generate Deploy-Ready Agent Packages

Writing Dockerfiles, SOUL.md configs, monitoring scripts, and entrypoint files from scratch takes time. The CrewClaw generator builds all of this for you in seconds.

SOUL.md Configs

Pick from 20+ agent roles. DevOps, Writer, Support, SEO, Metrics, and more. Each template is battle-tested with clear rules and tone.

Docker Package

Full deploy kit with Dockerfile, docker-compose.yml, entrypoint script, .env template, and monitoring configs included.

Monitoring Built In

Health checks, restart policies, and watchdog scripts come pre-configured. Works out of the box on any Docker host.

Generate your agent config, download the package, copy it to your server, and run docker compose up -d. That is the entire deploy process.

Frequently Asked Questions

Do I need a GPU to self-host AI agents with Docker?