Self-Hosted AI Agent: Complete Setup with Docker & Monitoring
Run your AI agents on your own hardware with full privacy, zero vendor lock-in, and a monitoring stack that alerts you before anything breaks. This guide covers Docker Compose, health checks, watchdog scripts, and a hardware comparison so you can pick the right self-hosted setup for your needs.
Why Self-Host Your AI Agents
Managed AI agent platforms are convenient until they are not. They read your prompts, throttle your usage, change pricing without warning, and shut down without notice. Self-hosting puts you back in control.
When you self-host, your agent configs, conversation history, and API keys never leave your network. You decide when to update, how many agents to run, and which AI provider to use. There is no monthly platform fee. The only ongoing costs are your hardware electricity and the AI API calls your agents make.
Architecture Overview
A self-hosted AI agent stack has three layers: the agent server that runs your agents, the Docker runtime that isolates and manages them, and the monitoring stack that watches everything.
┌─────────────────────────────────────────────────┐
│ Your Network │
│ │
│ ┌──────────────────────────────────────────┐ │
│ │ Docker Compose Stack │ │
│ │ │ │
│ │ ┌─────────────┐ ┌─────────────────┐ │ │
│ │ │ OpenClaw │ │ Prometheus │ │ │
│ │ │ Gateway │ │ (metrics) │ │ │
│ │ │ :18789 │ │ :9090 │ │ │
│ │ └──────┬───────┘ └────────┬────────┘ │ │
│ │ │ │ │ │
│ │ ┌──────┴───────┐ ┌───────┴─────────┐ │ │
│ │ │ Agent │ │ Grafana │ │ │
│ │ │ Workspaces │ │ (dashboards) │ │ │
│ │ │ (volumes) │ │ :3000 │ │ │
│ │ └──────────────┘ └─────────────────┘ │ │
│ │ │ │
│ │ ┌─────────────────────────────────┐ │ │
│ │ │ Watchdog (health + alerting) │ │ │
│ │ └─────────────────────────────────┘ │ │
│ └──────────────────────────────────────────┘ │
│ │
│ Telegram / Slack / Discord ←──→ AI APIs │
└─────────────────────────────────────────────────┘OpenClaw Gateway is the core. It runs your agents, manages sessions, routes messages to integrations (Telegram, Slack, Discord), and handles API calls to AI providers.
Prometheus scrapes metrics from the gateway every 15 seconds: active sessions, messages processed, API latency, error rates, and container resource usage.
Grafana visualizes those metrics in real-time dashboards. You can see agent activity, response times, and system health at a glance.
Watchdog is a lightweight script that pings the gateway health endpoint every 60 seconds and sends you an alert (via Telegram or email) if something goes wrong.
Docker Setup: Dockerfile and docker-compose.yml
Docker gives you reproducible deployments, easy rollbacks, and process isolation. One command brings up your entire agent stack. One command tears it down. Here is the complete setup.
Dockerfile
FROM node:22-slim
# Install OpenClaw globally
RUN npm install -g openclaw
# Create app directory
WORKDIR /app
# Copy agent configurations
COPY agents/ /app/agents/
# Copy entrypoint script
COPY entrypoint.sh /app/entrypoint.sh
RUN chmod +x /app/entrypoint.sh
# Expose gateway port
EXPOSE 18789
# Health check endpoint
HEALTHCHECK --interval=30s --timeout=5s --retries=3 \
CMD curl -f http://localhost:18789/health || exit 1
ENTRYPOINT ["/app/entrypoint.sh"]Entrypoint Script
#!/bin/bash
# entrypoint.sh
# Initialize OpenClaw if first run
if [ ! -f /data/.initialized ]; then
openclaw init
touch /data/.initialized
fi
# Register agents from /app/agents/
for agent_dir in /app/agents/*/; do
agent_name=$(basename "$agent_dir")
if [ -f "$agent_dir/SOUL.md" ]; then
openclaw agents add "$agent_name" \
--workspace "$agent_dir" --non-interactive 2>/dev/null || true
fi
done
# Configure AI provider from environment variable
if [ -n "$ANTHROPIC_API_KEY" ]; then
openclaw models auth paste-token \
--provider anthropic --token "$ANTHROPIC_API_KEY"
fi
# Start the gateway
exec openclaw gateway startdocker-compose.yml (Full Stack)
version: "3.8"
services:
openclaw:
build: .
container_name: openclaw-gateway
restart: unless-stopped
ports:
- "18789:18789"
volumes:
- openclaw-data:/data
- ./agents:/app/agents:ro
environment:
- NODE_ENV=production
- ANTHROPIC_API_KEY=${ANTHROPIC_API_KEY}
- TELEGRAM_BOT_TOKEN=${TELEGRAM_BOT_TOKEN}
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:18789/health"]
interval: 30s
timeout: 5s
retries: 3
start_period: 15s
logging:
driver: json-file
options:
max-size: "10m"
max-file: "3"
prometheus:
image: prom/prometheus:latest
container_name: openclaw-prometheus
restart: unless-stopped
ports:
- "9090:9090"
volumes:
- ./monitoring/prometheus.yml:/etc/prometheus/prometheus.yml:ro
- prometheus-data:/prometheus
command:
- "--config.file=/etc/prometheus/prometheus.yml"
- "--storage.tsdb.retention.time=30d"
grafana:
image: grafana/grafana:latest
container_name: openclaw-grafana
restart: unless-stopped
ports:
- "3000:3000"
volumes:
- grafana-data:/var/lib/grafana
environment:
- GF_SECURITY_ADMIN_PASSWORD=${GRAFANA_PASSWORD:-changeme}
- GF_USERS_ALLOW_SIGN_UP=false
watchdog:
build:
context: .
dockerfile: Dockerfile.watchdog
container_name: openclaw-watchdog
restart: unless-stopped
environment:
- HEALTH_URL=http://openclaw:18789/health
- CHECK_INTERVAL=60
- ALERT_TELEGRAM_TOKEN=${TELEGRAM_BOT_TOKEN}
- ALERT_CHAT_ID=${ALERT_CHAT_ID}
depends_on:
openclaw:
condition: service_healthy
volumes:
openclaw-data:
prometheus-data:
grafana-data:Environment File (.env)
# .env - keep this file out of version control
ANTHROPIC_API_KEY=sk-ant-xxxxxxxxxxxxxxxxxxxxx
TELEGRAM_BOT_TOKEN=123456:ABC-DEF-your-bot-token
ALERT_CHAT_ID=your-telegram-chat-id
GRAFANA_PASSWORD=your-secure-passwordImportant: Never commit your .env file to git. Add it to .gitignore. The docker-compose.yml references these variables with the ${VAR} syntax, so Docker reads them automatically from the .env file in the same directory.
Bring It Up
# Build and start everything
docker compose up -d --build
# Check container status
docker compose ps
# View gateway logs
docker compose logs -f openclaw
# Verify health endpoint
curl http://localhost:18789/healthMonitoring and Watchdog Setup
Self-hosted means self-monitored. You need to know when your agents go down before your users do. This section covers the Prometheus config, a standalone watchdog script, and Telegram alerting.
Prometheus Configuration
# monitoring/prometheus.yml
global:
scrape_interval: 15s
evaluation_interval: 15s
scrape_configs:
- job_name: "openclaw"
static_configs:
- targets: ["openclaw:18789"]
metrics_path: /metrics
scrape_interval: 15s
- job_name: "node-exporter"
static_configs:
- targets: ["host.docker.internal:9100"]Watchdog Script
The watchdog runs as its own container. It pings the gateway health endpoint every 60 seconds. If the health check fails three times in a row, it sends a Telegram alert. You can also run this script directly on the host without Docker.
#!/bin/bash
# watchdog.sh - Agent health monitor with Telegram alerting
HEALTH_URL="${HEALTH_URL:-http://localhost:18789/health}"
CHECK_INTERVAL="${CHECK_INTERVAL:-60}"
FAIL_THRESHOLD=3
TELEGRAM_TOKEN="${ALERT_TELEGRAM_TOKEN}"
CHAT_ID="${ALERT_CHAT_ID}"
fail_count=0
last_alert_time=0
ALERT_COOLDOWN=300 # 5 minutes between repeated alerts
send_alert() {
local message="$1"
if [ -n "$TELEGRAM_TOKEN" ] && [ -n "$CHAT_ID" ]; then
curl -s -X POST \
"https://api.telegram.org/bot${TELEGRAM_TOKEN}/sendMessage" \
-d "chat_id=${CHAT_ID}" \
-d "text=${message}" \
-d "parse_mode=HTML" > /dev/null
fi
echo "[ALERT] $message"
}
echo "[watchdog] Monitoring $HEALTH_URL every ${CHECK_INTERVAL}s"
while true; do
http_code=$(curl -s -o /dev/null -w "%{http_code}" \
--max-time 5 "$HEALTH_URL" 2>/dev/null)
if [ "$http_code" = "200" ]; then
if [ $fail_count -ge $FAIL_THRESHOLD ]; then
send_alert "OpenClaw gateway recovered. Status: OK"
fi
fail_count=0
else
fail_count=$((fail_count + 1))
echo "[watchdog] Health check failed ($http_code) - count $fail_count/$FAIL_THRESHOLD"
if [ $fail_count -ge $FAIL_THRESHOLD ]; then
now=$(date +%s)
diff=$((now - last_alert_time))
if [ $diff -ge $ALERT_COOLDOWN ]; then
send_alert "OpenClaw gateway is DOWN. Host: $(hostname). Status: $http_code. Failed checks: $fail_count. Time: $(date -u '+%Y-%m-%d %H:%M UTC')"
last_alert_time=$now
fi
fi
fi
sleep "$CHECK_INTERVAL"
doneWatchdog Dockerfile
# Dockerfile.watchdog
FROM alpine:3.19
RUN apk add --no-cache bash curl
COPY watchdog.sh /watchdog.sh
RUN chmod +x /watchdog.sh
CMD ["/watchdog.sh"]What the monitoring stack gives you
Docker health checks restart the gateway container automatically if the process hangs or crashes. No manual intervention needed.
Prometheus metrics track messages per minute, API response times, error rates, and container CPU/memory usage over time.
Grafana dashboards let you visualize trends and set alert thresholds directly in the UI (e.g., alert if error rate exceeds 5%).
Telegram watchdog sends you a push notification on your phone the moment something fails. Five-minute cooldown prevents alert spam.
Hardware Comparison: Mac Mini vs VPS vs Raspberry Pi
The best hardware depends on how many agents you plan to run, your budget, and whether you want physical control over the machine. All three options work with the same Docker setup.
| Feature | Raspberry Pi 5 | Cloud VPS | Mac Mini (M-series) |
|---|---|---|---|
| Upfront cost | $60 | $0 | $599+ |
| Monthly cost | ~$4 (power) | $6-24/month | ~$8 (power) |
| Agents supported | 3-5 | 5-20 (depends on plan) | 20-50+ |
| RAM | 8 GB | 2-8 GB | 16-24 GB |
| Docker support | Yes (ARM64) | Yes (x86_64) | Yes (ARM64) |
| Data privacy | Full (local) | Depends on provider | Full (local) |
| Power draw | 5-8W | N/A | 10-40W |
| Best for | Budget homelab | Quick start / remote | Power users / teams |
For most solo developers and hobbyists, a Raspberry Pi 5 is the sweet spot. It runs 3-5 agents comfortably, uses almost no electricity, and fits in a desk drawer. If you are running a larger team of agents or need local model inference, the Mac Mini is worth the investment. A cloud VPS makes sense when you need geographic redundancy or do not want to manage physical hardware.
Security: Network Isolation, API Keys, and Permissions
Self-hosting does not mean self-securing happens automatically. Follow these practices to keep your agent stack locked down.
Use Docker networks for isolation. By default, docker-compose creates an internal network. Only expose the ports you actually need. The gateway port (18789) can stay internal if you only access agents through Telegram or Slack. Prometheus and Grafana should be behind a reverse proxy with authentication if exposed to the internet.
Store API keys in .env, never in images. The docker-compose.yml references environment variables from your .env file. Never bake API keys into Dockerfiles or commit them to git. For extra security, use Docker secrets or a vault solution like HashiCorp Vault.
Run containers as non-root. Add a USER node directive to your Dockerfile after the npm install step. This prevents a container breakout from having root access to your host system.
Set agent permission boundaries. In each agent’s SOUL.md, explicitly define what the agent can and cannot do. If an agent should only read data, say so. If it should never execute shell commands, enforce it. The SOUL.md acts as both a personality config and a permission boundary.
Enable automatic security updates. Keep your Docker images updated. Use watchtower to automatically pull and redeploy updated base images on a schedule. For the host OS, enable unattended-upgrades (Debian/Ubuntu) or configure automatic updates for your platform.
# Add Watchtower for automatic image updates
# Append to your docker-compose.yml services:
watchtower:
image: containrrr/watchtower
container_name: watchtower
restart: unless-stopped
volumes:
- /var/run/docker.sock:/var/run/docker.sock
environment:
- WATCHTOWER_CLEANUP=true
- WATCHTOWER_SCHEDULE=0 0 4 * * * # Check daily at 4 AM
- WATCHTOWER_NOTIFICATIONS=shoutrrr
- WATCHTOWER_NOTIFICATION_URL=telegram://${TELEGRAM_BOT_TOKEN}@telegram?channels=${ALERT_CHAT_ID}Skip the Config Writing: Generate Deploy-Ready Agent Packages
Writing Dockerfiles, SOUL.md configs, monitoring scripts, and entrypoint files from scratch takes time. The CrewClaw generator builds all of this for you in seconds.
SOUL.md Configs
Pick from 20+ agent roles. DevOps, Writer, Support, SEO, Metrics, and more. Each template is battle-tested with clear rules and tone.
Docker Package
Full deploy kit with Dockerfile, docker-compose.yml, entrypoint script, .env template, and monitoring configs included.
Monitoring Built In
Health checks, restart policies, and watchdog scripts come pre-configured. Works out of the box on any Docker host.
Generate your agent config, download the package, copy it to your server, and run docker compose up -d. That is the entire deploy process.
Frequently Asked Questions
Do I need a GPU to self-host AI agents with Docker?
No. The AI agent framework (OpenClaw) acts as an orchestrator. It sends prompts to cloud AI providers like Anthropic Claude or OpenAI and handles the responses. The actual model inference happens on their servers. Your Docker host only needs enough CPU and RAM to run the gateway, manage sessions, and execute integrations. A dual-core machine with 2 GB RAM is sufficient for most setups.
How do I update my Dockerized AI agents without downtime?
Use a rolling update strategy with docker compose. Run 'docker compose pull' to fetch the latest images, then 'docker compose up -d' to recreate only the containers that changed. Docker Compose handles the transition gracefully. If you need zero-downtime deploys, run two instances behind a reverse proxy (like Traefik or Caddy) and drain one at a time. For most homelab setups, the 2-3 second restart window is perfectly acceptable.
What happens to agent conversations when Docker containers restart?
Conversations persist across restarts because session data is stored on a Docker volume mounted to the host filesystem. The docker-compose.yml in this guide maps ~/openclaw-data to the container's data directory. When a container restarts (manually or via health check failure), it picks up all existing sessions exactly where they left off. No messages are lost.
Can I run self-hosted AI agents on a NAS like Synology or Unraid?
Yes. Any device that supports Docker can run OpenClaw agents. Synology NAS models with Docker support (DS220+, DS920+, and newer) work well. Unraid has native Docker support and makes volume management straightforward. TrueNAS Scale also supports Docker containers. The docker-compose.yml from this guide works on all of these platforms without modification.
How much bandwidth do self-hosted AI agents use?
Very little. Each agent message is a small JSON payload sent to the AI provider API (typically 1-10 KB per request). Even a busy agent handling 500 messages per day uses less than 50 MB of bandwidth monthly. The monitoring stack (Prometheus + Grafana) adds minimal overhead since it only scrapes local metrics. You do not need a high-speed connection. A stable 5 Mbps link is more than enough.
Deploy Your AI Agent Stack in 5 Minutes
Use the CrewClaw generator to build a complete agent package with Docker configs, monitoring, and SOUL.md templates. Pick your agents, download the deploy kit, and run docker compose up.