OpenClaw Local Setup: Stop Herding Tiny Computer People

Why Local Setup Is Harder Than It Looks

Running a single LLM locally with Ollama is straightforward. You install it, pull a model, and start chatting. Running a multi-agent system locally is an entirely different challenge. You are not just running one model. You need a gateway process that routes messages between agents, each agent potentially running a different model, a messaging layer for communication, and enough hardware headroom to handle concurrent inference requests without everything grinding to a halt.

The Reddit user with dual RTX 3060s ran into every single one of these problems simultaneously. Their 3060s each had 12GB VRAM, which is enough for one medium-sized model per GPU. But the OpenClaw gateway was trying to load the same model twice because of a misconfigured routing table. Docker was isolating the Ollama instance from the gateway, so API calls were timing out. And 96GB of system RAM was being eaten alive by model spillover because the quantization settings were wrong.

None of these are unsolvable problems. But when you hit all of them at once, on a Saturday night, with Docker logs scrolling faster than you can read, it feels impossible. The fix is to approach local setup in stages instead of trying to get everything running at once.

Step 1: Start Without Docker

This is the most important piece of advice in this entire guide. Do not start with Docker. Docker adds a networking layer, volume mount complexity, and process isolation that will obscure every other problem you encounter. Get OpenClaw running on your bare operating system first. Once everything works, then containerize.

Install OpenClaw and Ollama on bare metal first

# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh

# Pull a small model to test with
ollama pull qwen2.5:7b

# Verify Ollama is running and accessible
curl http://localhost:11434/api/tags

# Install OpenClaw
npm install -g openclaw

# Create your first agent directory
mkdir -p ~/.openclaw/agents/test-agent
cat > ~/.openclaw/agents/test-agent/SOUL.md << 'EOF'
# Agent: Test Agent
# Model: qwen2.5:7b
# Provider: ollama

You are a test agent. Respond briefly to confirm you are working.
EOF

# Start the gateway
openclaw gateway start

# Test the agent
openclaw agent --agent test-agent --message "Are you working?"

If this works, congratulations. You have a functioning local OpenClaw agent. If it does not, you now know the problem is in your base setup, not in Docker networking. Fix it here before adding any complexity.

Step 2: Add a Second Agent and Test Communication

Multi-agent setups fail at the communication layer more often than anywhere else. Before you add five agents, add one more. Create a simple PM agent that delegates to your test agent.

Two-agent setup for testing delegation

# Create PM agent
mkdir -p ~/.openclaw/agents/pm
cat > ~/.openclaw/agents/pm/SOUL.md << 'EOF'
# Agent: PM
# Model: qwen2.5:7b
# Provider: ollama

You are a project manager. When given a task, delegate it to @test-agent
and report back with their response.
EOF

# Create AGENTS.md for team coordination
cat > ~/.openclaw/AGENTS.md << 'EOF'
## Team
- @pm: Project manager, coordinates tasks
- @test-agent: General worker, handles delegated tasks

## Workflow
1. @pm receives tasks and delegates to @test-agent
2. @test-agent completes work and reports to @pm
3. @pm summarizes and reports back
EOF

# Test delegation
openclaw agent --agent pm --message "Ask test-agent to summarize what 2+2 equals"

If delegation works, your gateway routing is correct and both agents can communicate. If it fails, check that the gateway is running and that both agent directories are properly structured. The most common issue here is agent name mismatches between AGENTS.md and the directory names.

Step 3: Now Containerize (If You Want To)

With a working bare-metal setup, moving to Docker becomes much simpler because you already know what a working configuration looks like. The key Docker considerations for OpenClaw are:

Network mode: host

The simplest option. Use --network host for both the Ollama container and the OpenClaw gateway container. This avoids all inter-container networking issues at the cost of port isolation. For a local dev setup, this is almost always the right choice.

GPU passthrough

If you are running on Linux with NVIDIA GPUs, you need nvidia-container-toolkit installed and --gpus all passed to the Ollama container. On Unraid, this requires the NVIDIA Driver plugin and manual device mapping in the container template.

Volume mounts

Mount your ~/.openclaw directory into the gateway container so your agent configurations persist. Mount the Ollama models directory (~/.ollama/models) so you do not re-download models every time the container restarts.

Environment variables

Set OLLAMA_HOST=http://localhost:11434 in the gateway container if using host networking. If using a Docker bridge network, replace localhost with the Ollama container name.

docker-compose.yml for local OpenClaw

version: "3.8"
services:
  ollama:
    image: ollama/ollama:latest
    network_mode: host
    volumes:
      - ~/.ollama:/root/.ollama
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities: [gpu]

  openclaw-gateway:
    image: openclaw/gateway:latest
    network_mode: host
    volumes:
      - ~/.openclaw:/root/.openclaw
    environment:
      - OLLAMA_HOST=http://localhost:11434
    depends_on:
      - ollama

Hardware Guide: What Actually Works

The hardware question comes up constantly on Reddit, and the answers are all over the place. Someone recommends an RTX 4090, someone else says a Mac Mini M4 is enough, and a third person insists you need a server rack. Here is the honest breakdown.

Setup	Price Range	Max Model Size	Concurrent Agents	Best For
Mac Mini M4 (16GB)	$599	Qwen 7B - 13B	1-2	Single agent, light tasks
Mac Mini M4 Pro (24GB)	$999	Qwen 27B, Llama 70B (Q4)	2-3	Multi-agent, moderate tasks
PC + RTX 3060 12GB	$800-1200	Qwen 27B (Q4)	1-2	Budget GPU setup
PC + RTX 4090 24GB	$2000-2800	Llama 70B (Q4), Qwen 72B	2-3	Power user, fast inference
VPS (Hetzner, etc.)	$30-80/mo	Varies by plan	1-5	Always-on, no local hardware

The Mac Mini M4 Pro at $999 is the best value for most people running OpenClaw locally. The unified memory architecture means the full 24GB is available for model inference, unlike PC setups where system RAM and VRAM are separate pools. The dual RTX 3060 setup from our Reddit friend is actually decent hardware, but split VRAM across two GPUs creates more complexity than a single large memory pool.

For always-on setups where agents need to be available 24/7, a VPS makes more sense than running a local machine. Hetzner dedicated servers with GPU access start around $50/month and can run mid-size models continuously without worrying about power costs or hardware maintenance.

Model Selection for Local Agents

Not all local models are created equal for agent work. A model that scores well on benchmarks might be terrible at following the structured instructions that OpenClaw agents require. Here is what actually works based on community testing.

Qwen 2.5 7B

The minimum viable model for agent tasks. Fast inference on almost any hardware, handles simple formatting, routing, and data extraction well. Falls apart on complex reasoning or multi-step planning. Good for social media agents, data formatters, and message routers.

Qwen 2.5 27B

The sweet spot for most local agent setups. Fits in 12-16GB VRAM with 4-bit quantization. Handles coordination, analysis, and moderate writing tasks. This is what most Reddit users running local OpenClaw teams end up settling on after trying bigger models that were too slow.

Llama 3.3 70B (Q4)

The local equivalent of a mid-tier cloud model. Requires 24GB+ VRAM or 48GB+ unified memory on Mac. Inference is slower but quality is noticeably better for PM agents and writer agents that need nuanced output. Run this on your most important agent and use smaller models for the rest.

DeepSeek V3 (Q4)

Strong coding and reasoning capabilities. Works well for developer-focused agent teams. The 671B parameter count means you need aggressive quantization and at least 48GB of memory to run it usably. Most people running DeepSeek for OpenClaw use the API instead of running it locally.

SOUL.md: Configuring different local models per agent

# PM Agent - needs best reasoning
# Model: llama3.3:70b-instruct-q4_K_M
# Provider: ollama

# SEO Analyst - moderate reasoning, lots of data
# Model: qwen2.5:27b-instruct-q4_K_M
# Provider: ollama

# Social Media - simple formatting, speed matters
# Model: qwen2.5:7b-instruct
# Provider: ollama

The Hybrid Approach: Local + Cloud

Here is the reality that most "run everything locally" guides will not tell you: a hybrid setup with local models for simple tasks and cloud APIs for complex reasoning is almost always better than trying to run everything locally. It is cheaper than pure cloud, faster than pure local, and more reliable than either.

The idea is straightforward. Your PM agent handles coordination and complex decision-making, so it runs on Claude Haiku or Gemini Flash through a cloud API. Cost is $0.25 per million tokens for Haiku, which works out to pennies per day for a PM agent. Your worker agents handle repetitive tasks like formatting, data extraction, and simple analysis, so they run on local Qwen 27B through Ollama. Zero API cost.

Mixed local + cloud agent configuration

# ~/.openclaw/agents/orion/SOUL.md (PM - cloud)
# Model: claude-haiku
# Provider: anthropic
# API_KEY: sk-ant-...

You are the project coordinator. Delegate tasks to local agents.

# ~/.openclaw/agents/radar/SOUL.md (SEO - local)
# Model: qwen2.5:27b-instruct-q4_K_M
# Provider: ollama

You analyze SEO data and generate keyword reports.

# ~/.openclaw/agents/pulse/SOUL.md (Social - local)
# Model: qwen2.5:7b-instruct
# Provider: ollama

You format content for social media platforms.

This setup gives you cloud-quality coordination with near-zero running costs. The PM makes smart decisions using a model that excels at reasoning, while the worker agents crank through tasks on free local hardware. Monthly cost for this hybrid team: roughly $0.50 to $2.00, compared to $8 to $15 for running everything on cloud APIs.

Common Mistakes That Waste Your Weekend

Based on hundreds of Reddit threads and community Discord messages, here are the mistakes that trip up almost everyone on their first local setup attempt.

Trying to run everything at once

You install Ollama, Docker, the gateway, five agents, and a Telegram bot all at the same time. When something breaks, you have no idea which layer failed. Start with one agent on bare metal. Add complexity one piece at a time.

Wrong model size for your hardware

A 70B model on 12GB VRAM will technically run, but inference takes 30 seconds per response because most of the model is spilling to system RAM. Your agents will time out, retry, and create a cascade of failures. Use a model that fits entirely in your available VRAM.

Docker bridge networking with Ollama

The default Docker bridge network assigns internal IPs that change on restart. Your gateway config points to the old IP, Ollama is unreachable, and every agent hangs. Use host networking or set up a fixed Docker network with static IPs.

Forgetting to set keep_alive in Ollama

By default, Ollama unloads models after 5 minutes of inactivity. When your agent sends a message after 6 minutes, Ollama has to reload the entire model into VRAM, adding 10 to 30 seconds of latency. Set OLLAMA_KEEP_ALIVE=-1 to keep models loaded permanently, or set it to a longer duration like 30m.

Running all agents on the same large model

If all 5 agents use the same 27B model, Ollama keeps one copy in memory but each concurrent request queues behind the previous one. Sequential inference on a single model instance means your 5-agent team is effectively single-threaded. Use different model sizes for different roles so Ollama can serve lighter requests faster.

Ignoring power consumption

A PC with an RTX 4090 draws 300-450W under load. Running it 24/7 costs $25 to $40/month in electricity depending on your rates. At that point, a Hetzner VPS at $30/month with better uptime and no noise might be the smarter choice.

When Local Makes Sense (and When It Does Not)

Local OpenClaw is the right choice when you need data privacy (nothing leaves your machine), you already have capable hardware sitting idle, or you want to experiment without worrying about API costs accumulating. Developers who process sensitive client data, researchers working with proprietary datasets, and hobbyists who enjoy the tinkering process all benefit from local setups.

Local is the wrong choice when you need agents available 24/7 and do not want to babysit hardware, when your tasks require frontier model quality that local models cannot match, or when the time you spend debugging Docker networking and model configs is worth more than the $5 to $10/month you would spend on cloud APIs.

Be honest with yourself about which category you fall into. The Reddit user with dual 3060s spent an entire weekend on setup. If they bill their time at even $30/hour, that weekend cost $480 in lost productivity. A cloud-based OpenClaw team would have been running in 15 minutes for $8/month. Sometimes the cheapest option is the one that lets you start building immediately.

Frequently Asked Questions

What is the minimum hardware to run OpenClaw locally?

You need at least 16GB of RAM and a reasonably modern CPU to run a single agent with a small model like Qwen 7B through Ollama. For practical multi-agent setups, 32GB RAM is the comfortable minimum. A dedicated GPU is not strictly required for small models, but it makes a massive difference for anything above 13B parameters. Mac users benefit from unified memory, meaning a Mac Mini M4 with 24GB can run models that would require a discrete GPU on a PC.

Can I run OpenClaw with Docker on Unraid?

Yes, but Docker networking is where most people get stuck. The OpenClaw gateway needs to communicate with Ollama, and both need to be on the same Docker network or use host networking. On Unraid specifically, the custom Docker network setup requires manual bridge configuration. Start without Docker first, get everything working on bare metal, then containerize once you have confirmed the agents and models work correctly.

Should I use Ollama or a cloud API for OpenClaw agents?

It depends on the agent role. Local Ollama models work well for simple, repetitive tasks like data formatting, basic analysis, and message routing. Cloud APIs are better for complex reasoning, creative writing, and coordination tasks. The best approach is hybrid: run your simple agents on local Ollama and your PM or writer agent on a cloud API like Claude Haiku or Gemini Flash. This keeps costs near zero while maintaining quality where it matters.

Which local model should I use for OpenClaw agents?

For agents that need decent reasoning on consumer hardware, Qwen 2.5 27B is the sweet spot. It fits in 16GB VRAM and handles coordination and analysis tasks well. If you have 24GB or more VRAM (or a Mac with 32GB unified memory), Llama 3.3 70B quantized to 4-bit gives you near-cloud quality for most agent tasks. For lightweight agents doing simple formatting or routing, Qwen 7B or Llama 3.2 8B are fast and use minimal resources.

Why is my local OpenClaw setup so slow?

The most common cause is running a model that is too large for your available VRAM. When a model does not fit entirely in GPU memory, it spills to system RAM, which is 10 to 50 times slower. Check your GPU utilization while running inference. If VRAM usage is maxed and system RAM is also being used, switch to a smaller model or a more aggressive quantization. Another common cause is running multiple agents simultaneously, each loading the same model into separate memory spaces. Use Ollama with keep_alive to share a single model instance across agents.

Can I run all 5 OpenClaw agents locally on a single machine?

Technically yes, but practically it requires careful planning. Five agents all running inference simultaneously will overwhelm most consumer hardware. The solution is sequential processing: only one agent runs inference at a time while others wait. The OpenClaw gateway handles this naturally if you configure agents with staggered heartbeat intervals. A Mac Studio M4 Max with 64GB or a PC with 64GB RAM and an RTX 4090 can handle 2 to 3 concurrent agents on smaller models.

Or skip the setup entirely

190+ pre-configured agent templates with tested configs, Docker setup, and deploy packages. Every template works out of the box with both local and cloud models.

Browse Agent Templates →