OpenClaw Local Setup: Stop Herding Tiny Computer People
There is a Reddit post that keeps haunting me. Someone built a custom PC with dual RTX 3060 GPUs, 96GB of system RAM, running Unraid with Docker. They spent an entire weekend trying to get a 3-agent OpenClaw team running locally. By Sunday night, they had one agent kind of working, two models that kept crashing, and a Docker networking setup that made no sense to anyone in the comments. This guide exists so you do not become that person.
Why Local Setup Is Harder Than It Looks
Running a single LLM locally with Ollama is straightforward. You install it, pull a model, and start chatting. Running a multi-agent system locally is an entirely different challenge. You are not just running one model. You need a gateway process that routes messages between agents, each agent potentially running a different model, a messaging layer for communication, and enough hardware headroom to handle concurrent inference requests without everything grinding to a halt.
The Reddit user with dual RTX 3060s ran into every single one of these problems simultaneously. Their 3060s each had 12GB VRAM, which is enough for one medium-sized model per GPU. But the OpenClaw gateway was trying to load the same model twice because of a misconfigured routing table. Docker was isolating the Ollama instance from the gateway, so API calls were timing out. And 96GB of system RAM was being eaten alive by model spillover because the quantization settings were wrong.
None of these are unsolvable problems. But when you hit all of them at once, on a Saturday night, with Docker logs scrolling faster than you can read, it feels impossible. The fix is to approach local setup in stages instead of trying to get everything running at once.
Step 1: Start Without Docker
This is the most important piece of advice in this entire guide. Do not start with Docker. Docker adds a networking layer, volume mount complexity, and process isolation that will obscure every other problem you encounter. Get OpenClaw running on your bare operating system first. Once everything works, then containerize.
# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh
# Pull a small model to test with
ollama pull qwen2.5:7b
# Verify Ollama is running and accessible
curl http://localhost:11434/api/tags
# Install OpenClaw
npm install -g openclaw
# Create your first agent directory
mkdir -p ~/.openclaw/agents/test-agent
cat > ~/.openclaw/agents/test-agent/SOUL.md << 'EOF'
# Agent: Test Agent
# Model: qwen2.5:7b
# Provider: ollama
You are a test agent. Respond briefly to confirm you are working.
EOF
# Start the gateway
openclaw gateway start
# Test the agent
openclaw agent --agent test-agent --message "Are you working?"If this works, congratulations. You have a functioning local OpenClaw agent. If it does not, you now know the problem is in your base setup, not in Docker networking. Fix it here before adding any complexity.
Step 2: Add a Second Agent and Test Communication
Multi-agent setups fail at the communication layer more often than anywhere else. Before you add five agents, add one more. Create a simple PM agent that delegates to your test agent.
# Create PM agent
mkdir -p ~/.openclaw/agents/pm
cat > ~/.openclaw/agents/pm/SOUL.md << 'EOF'
# Agent: PM
# Model: qwen2.5:7b
# Provider: ollama
You are a project manager. When given a task, delegate it to @test-agent
and report back with their response.
EOF
# Create AGENTS.md for team coordination
cat > ~/.openclaw/AGENTS.md << 'EOF'
## Team
- @pm: Project manager, coordinates tasks
- @test-agent: General worker, handles delegated tasks
## Workflow
1. @pm receives tasks and delegates to @test-agent
2. @test-agent completes work and reports to @pm
3. @pm summarizes and reports back
EOF
# Test delegation
openclaw agent --agent pm --message "Ask test-agent to summarize what 2+2 equals"If delegation works, your gateway routing is correct and both agents can communicate. If it fails, check that the gateway is running and that both agent directories are properly structured. The most common issue here is agent name mismatches between AGENTS.md and the directory names.
Step 3: Now Containerize (If You Want To)
With a working bare-metal setup, moving to Docker becomes much simpler because you already know what a working configuration looks like. The key Docker considerations for OpenClaw are:
Network mode: host
The simplest option. Use --network host for both the Ollama container and the OpenClaw gateway container. This avoids all inter-container networking issues at the cost of port isolation. For a local dev setup, this is almost always the right choice.
GPU passthrough
If you are running on Linux with NVIDIA GPUs, you need nvidia-container-toolkit installed and --gpus all passed to the Ollama container. On Unraid, this requires the NVIDIA Driver plugin and manual device mapping in the container template.
Volume mounts
Mount your ~/.openclaw directory into the gateway container so your agent configurations persist. Mount the Ollama models directory (~/.ollama/models) so you do not re-download models every time the container restarts.
Environment variables
Set OLLAMA_HOST=http://localhost:11434 in the gateway container if using host networking. If using a Docker bridge network, replace localhost with the Ollama container name.
version: "3.8"
services:
ollama:
image: ollama/ollama:latest
network_mode: host
volumes:
- ~/.ollama:/root/.ollama
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu]
openclaw-gateway:
image: openclaw/gateway:latest
network_mode: host
volumes:
- ~/.openclaw:/root/.openclaw
environment:
- OLLAMA_HOST=http://localhost:11434
depends_on:
- ollamaHardware Guide: What Actually Works
The hardware question comes up constantly on Reddit, and the answers are all over the place. Someone recommends an RTX 4090, someone else says a Mac Mini M4 is enough, and a third person insists you need a server rack. Here is the honest breakdown.
| Setup | Price Range | Max Model Size | Concurrent Agents | Best For |
|---|---|---|---|---|
| Mac Mini M4 (16GB) | $599 | Qwen 7B - 13B | 1-2 | Single agent, light tasks |
| Mac Mini M4 Pro (24GB) | $999 | Qwen 27B, Llama 70B (Q4) | 2-3 | Multi-agent, moderate tasks |
| PC + RTX 3060 12GB | $800-1200 | Qwen 27B (Q4) | 1-2 | Budget GPU setup |
| PC + RTX 4090 24GB | $2000-2800 | Llama 70B (Q4), Qwen 72B | 2-3 | Power user, fast inference |
| VPS (Hetzner, etc.) | $30-80/mo | Varies by plan | 1-5 | Always-on, no local hardware |
The Mac Mini M4 Pro at $999 is the best value for most people running OpenClaw locally. The unified memory architecture means the full 24GB is available for model inference, unlike PC setups where system RAM and VRAM are separate pools. The dual RTX 3060 setup from our Reddit friend is actually decent hardware, but split VRAM across two GPUs creates more complexity than a single large memory pool.
For always-on setups where agents need to be available 24/7, a VPS makes more sense than running a local machine. Hetzner dedicated servers with GPU access start around $50/month and can run mid-size models continuously without worrying about power costs or hardware maintenance.
Model Selection for Local Agents
Not all local models are created equal for agent work. A model that scores well on benchmarks might be terrible at following the structured instructions that OpenClaw agents require. Here is what actually works based on community testing.
Qwen 2.5 7B
The minimum viable model for agent tasks. Fast inference on almost any hardware, handles simple formatting, routing, and data extraction well. Falls apart on complex reasoning or multi-step planning. Good for social media agents, data formatters, and message routers.
Qwen 2.5 27B
The sweet spot for most local agent setups. Fits in 12-16GB VRAM with 4-bit quantization. Handles coordination, analysis, and moderate writing tasks. This is what most Reddit users running local OpenClaw teams end up settling on after trying bigger models that were too slow.
Llama 3.3 70B (Q4)
The local equivalent of a mid-tier cloud model. Requires 24GB+ VRAM or 48GB+ unified memory on Mac. Inference is slower but quality is noticeably better for PM agents and writer agents that need nuanced output. Run this on your most important agent and use smaller models for the rest.
DeepSeek V3 (Q4)
Strong coding and reasoning capabilities. Works well for developer-focused agent teams. The 671B parameter count means you need aggressive quantization and at least 48GB of memory to run it usably. Most people running DeepSeek for OpenClaw use the API instead of running it locally.
# PM Agent - needs best reasoning
# Model: llama3.3:70b-instruct-q4_K_M
# Provider: ollama
# SEO Analyst - moderate reasoning, lots of data
# Model: qwen2.5:27b-instruct-q4_K_M
# Provider: ollama
# Social Media - simple formatting, speed matters
# Model: qwen2.5:7b-instruct
# Provider: ollamaThe Hybrid Approach: Local + Cloud
Here is the reality that most "run everything locally" guides will not tell you: a hybrid setup with local models for simple tasks and cloud APIs for complex reasoning is almost always better than trying to run everything locally. It is cheaper than pure cloud, faster than pure local, and more reliable than either.
The idea is straightforward. Your PM agent handles coordination and complex decision-making, so it runs on Claude Haiku or Gemini Flash through a cloud API. Cost is $0.25 per million tokens for Haiku, which works out to pennies per day for a PM agent. Your worker agents handle repetitive tasks like formatting, data extraction, and simple analysis, so they run on local Qwen 27B through Ollama. Zero API cost.
# ~/.openclaw/agents/orion/SOUL.md (PM - cloud)
# Model: claude-haiku
# Provider: anthropic
# API_KEY: sk-ant-...
You are the project coordinator. Delegate tasks to local agents.
# ~/.openclaw/agents/radar/SOUL.md (SEO - local)
# Model: qwen2.5:27b-instruct-q4_K_M
# Provider: ollama
You analyze SEO data and generate keyword reports.
# ~/.openclaw/agents/pulse/SOUL.md (Social - local)
# Model: qwen2.5:7b-instruct
# Provider: ollama
You format content for social media platforms.This setup gives you cloud-quality coordination with near-zero running costs. The PM makes smart decisions using a model that excels at reasoning, while the worker agents crank through tasks on free local hardware. Monthly cost for this hybrid team: roughly $0.50 to $2.00, compared to $8 to $15 for running everything on cloud APIs.
Common Mistakes That Waste Your Weekend
Based on hundreds of Reddit threads and community Discord messages, here are the mistakes that trip up almost everyone on their first local setup attempt.
Trying to run everything at once
You install Ollama, Docker, the gateway, five agents, and a Telegram bot all at the same time. When something breaks, you have no idea which layer failed. Start with one agent on bare metal. Add complexity one piece at a time.
Wrong model size for your hardware
A 70B model on 12GB VRAM will technically run, but inference takes 30 seconds per response because most of the model is spilling to system RAM. Your agents will time out, retry, and create a cascade of failures. Use a model that fits entirely in your available VRAM.
Docker bridge networking with Ollama
The default Docker bridge network assigns internal IPs that change on restart. Your gateway config points to the old IP, Ollama is unreachable, and every agent hangs. Use host networking or set up a fixed Docker network with static IPs.
Forgetting to set keep_alive in Ollama
By default, Ollama unloads models after 5 minutes of inactivity. When your agent sends a message after 6 minutes, Ollama has to reload the entire model into VRAM, adding 10 to 30 seconds of latency. Set OLLAMA_KEEP_ALIVE=-1 to keep models loaded permanently, or set it to a longer duration like 30m.
Running all agents on the same large model
If all 5 agents use the same 27B model, Ollama keeps one copy in memory but each concurrent request queues behind the previous one. Sequential inference on a single model instance means your 5-agent team is effectively single-threaded. Use different model sizes for different roles so Ollama can serve lighter requests faster.
Ignoring power consumption
A PC with an RTX 4090 draws 300-450W under load. Running it 24/7 costs $25 to $40/month in electricity depending on your rates. At that point, a Hetzner VPS at $30/month with better uptime and no noise might be the smarter choice.
When Local Makes Sense (and When It Does Not)
Local OpenClaw is the right choice when you need data privacy (nothing leaves your machine), you already have capable hardware sitting idle, or you want to experiment without worrying about API costs accumulating. Developers who process sensitive client data, researchers working with proprietary datasets, and hobbyists who enjoy the tinkering process all benefit from local setups.
Local is the wrong choice when you need agents available 24/7 and do not want to babysit hardware, when your tasks require frontier model quality that local models cannot match, or when the time you spend debugging Docker networking and model configs is worth more than the $5 to $10/month you would spend on cloud APIs.
Be honest with yourself about which category you fall into. The Reddit user with dual 3060s spent an entire weekend on setup. If they bill their time at even $30/hour, that weekend cost $480 in lost productivity. A cloud-based OpenClaw team would have been running in 15 minutes for $8/month. Sometimes the cheapest option is the one that lets you start building immediately.
Frequently Asked Questions
What is the minimum hardware to run OpenClaw locally?
You need at least 16GB of RAM and a reasonably modern CPU to run a single agent with a small model like Qwen 7B through Ollama. For practical multi-agent setups, 32GB RAM is the comfortable minimum. A dedicated GPU is not strictly required for small models, but it makes a massive difference for anything above 13B parameters. Mac users benefit from unified memory, meaning a Mac Mini M4 with 24GB can run models that would require a discrete GPU on a PC.
Can I run OpenClaw with Docker on Unraid?
Yes, but Docker networking is where most people get stuck. The OpenClaw gateway needs to communicate with Ollama, and both need to be on the same Docker network or use host networking. On Unraid specifically, the custom Docker network setup requires manual bridge configuration. Start without Docker first, get everything working on bare metal, then containerize once you have confirmed the agents and models work correctly.
Should I use Ollama or a cloud API for OpenClaw agents?
It depends on the agent role. Local Ollama models work well for simple, repetitive tasks like data formatting, basic analysis, and message routing. Cloud APIs are better for complex reasoning, creative writing, and coordination tasks. The best approach is hybrid: run your simple agents on local Ollama and your PM or writer agent on a cloud API like Claude Haiku or Gemini Flash. This keeps costs near zero while maintaining quality where it matters.
Which local model should I use for OpenClaw agents?
For agents that need decent reasoning on consumer hardware, Qwen 2.5 27B is the sweet spot. It fits in 16GB VRAM and handles coordination and analysis tasks well. If you have 24GB or more VRAM (or a Mac with 32GB unified memory), Llama 3.3 70B quantized to 4-bit gives you near-cloud quality for most agent tasks. For lightweight agents doing simple formatting or routing, Qwen 7B or Llama 3.2 8B are fast and use minimal resources.
Why is my local OpenClaw setup so slow?
The most common cause is running a model that is too large for your available VRAM. When a model does not fit entirely in GPU memory, it spills to system RAM, which is 10 to 50 times slower. Check your GPU utilization while running inference. If VRAM usage is maxed and system RAM is also being used, switch to a smaller model or a more aggressive quantization. Another common cause is running multiple agents simultaneously, each loading the same model into separate memory spaces. Use Ollama with keep_alive to share a single model instance across agents.
Can I run all 5 OpenClaw agents locally on a single machine?
Technically yes, but practically it requires careful planning. Five agents all running inference simultaneously will overwhelm most consumer hardware. The solution is sequential processing: only one agent runs inference at a time while others wait. The OpenClaw gateway handles this naturally if you configure agents with staggered heartbeat intervals. A Mac Studio M4 Max with 64GB or a PC with 64GB RAM and an RTX 4090 can handle 2 to 3 concurrent agents on smaller models.
Or skip the setup entirely
190+ pre-configured agent templates with tested configs, Docker setup, and deploy packages. Every template works out of the box with both local and cloud models.
Browse Agent Templates โDeploy a Ready-Made AI Agent
Skip the setup. Pick a template and deploy in 60 seconds.