OpenClaw + Ollama: Run AI Agents for Free with Local Models
Run OpenClaw agents without paying for API calls. This guide walks you through installing Ollama, choosing the right local model, configuring OpenClaw, and building a hybrid setup that balances cost and quality.
What Is Ollama?
Ollama is a lightweight runtime that lets you run large language models on your own hardware. It supports models like Llama 3.2, Mistral, CodeLlama, Phi-3, and dozens more. You download a model once, and it runs entirely on your machine through a local API endpoint.
Think of it as Docker for LLMs. One command pulls a model, another starts serving it. OpenClaw connects to Ollama the same way it connects to Anthropic or OpenAI, but everything stays local and costs nothing to run.
Why Run AI Agents Locally?
There are three strong reasons to run OpenClaw agents on local models instead of cloud APIs.
Privacy
Your data never leaves your machine. No prompts are sent to external servers. This matters for agents that handle proprietary code, internal documents, or customer data.
Zero API Cost
Cloud APIs charge per token. An active agent processing hundreds of messages per day can cost $50-400/month. Ollama runs on hardware you already own for $0/month in API fees.
Offline Capability
Once a model is downloaded, your agents work without internet. No rate limits, no outages, no dependency on external services. Your agents run 24/7 regardless of connectivity.
Step 1: Install Ollama
Ollama is available for macOS, Linux, and Windows. Install it with a single command.
# macOS (using Homebrew)
brew install ollama
# Linux
curl -fsSL https://ollama.ai/install.sh | sh
# Windows
# Download the installer from ollama.com/download
# Start the Ollama service
ollama serveOllama runs as a background service on port 11434 by default. Keep it running while your OpenClaw agents are active.
Step 2: Pull a Model
Download one or more models that your agents will use. Each model is downloaded once and cached locally.
# General purpose (recommended starting point)
ollama pull llama3.2
# Fast and efficient
ollama pull mistral
# Optimized for code tasks
ollama pull codellama
# Lightweight, runs on 8GB RAM
ollama pull phi3
# Verify your models
ollama listDownload sizes vary: Phi-3 Mini is about 2.3 GB, Mistral 7B is about 4 GB, and Llama 3.2 8B is about 4.7 GB.
Step 3: Configure OpenClaw to Use Ollama
Point OpenClaw to your local Ollama instance. No API key is needed since everything runs on localhost.
# Initialize OpenClaw (skip if already set up)
npx openclaw init
# Configure Ollama as the model provider
openclaw models auth paste-token --provider ollama
# When prompted for a token, just press Enter (no key needed)
# OpenClaw automatically detects Ollama on localhost:11434If you changed the Ollama port or are running it on a different machine on your network, update the endpoint in your OpenClaw configuration to point to the correct address.
Step 4: Test the Connection
Create a simple agent and send it a message to verify everything works.
# Create a test agent workspace
mkdir -p agents/test-agent
# Create a minimal SOUL.md
cat > agents/test-agent/SOUL.md << 'EOF'
# Test Agent
## Identity
You are a helpful assistant for testing local model connections.
## Rules
- Keep responses concise
- Confirm which model you are running on when asked
EOF
# Register the agent
openclaw agents add test-agent --workspace ./agents/test-agent --non-interactive
# Send a test message
openclaw agent --agent test-agent --message "Hello! Confirm you are running locally."
# Or start the gateway for web access
openclaw gateway start
# Visit http://localhost:18789The first message may take a few seconds as Ollama loads the model into memory. Subsequent messages are much faster. On Apple Silicon Macs, expect 20-40 tokens per second with Llama 3.2 8B.
Best Models for OpenClaw Agents
Not all models are equal. Here is what works best for different agent use cases.
| Model | Size | RAM | Best For |
|---|---|---|---|
| Phi-3 Mini | 2.3 GB | 8 GB | Lightweight tasks, runs on minimal hardware |
| Mistral 7B | 4.1 GB | 16 GB | Fast inference, efficient for high-volume tasks |
| CodeLlama 7B | 3.8 GB | 16 GB | Development agents, code generation and review |
| Llama 3.2 8B | 4.7 GB | 16 GB | Recommended - best quality/speed balance |
| Llama 3.2 70B | 40 GB | 64 GB | Near cloud-quality reasoning |
Tip: Start with Llama 3.2 8B. It handles content writing, research summaries, and code review well. Switch to a specialized model only if you need it for a specific use case.
Performance Comparison: Local vs Cloud
Local and cloud models have different strengths. Here is how they compare across the factors that matter most.
| Factor | Local (Ollama) | Cloud (Claude, GPT-4) |
|---|---|---|
| Speed | Slower generation, but zero network latency | Faster generation, but adds network round-trip |
| Quality | Good for routine tasks, weaker on complex reasoning | Best for creative writing and nuanced analysis |
| Cost | $0/month | $50-400/month |
| Privacy | Data stays on your machine | Data sent to provider servers |
| Availability | Works offline, no rate limits | Depends on internet and provider uptime |
The Hybrid Approach: Best of Both Worlds
You do not have to choose one or the other. OpenClaw supports per-agent model configuration, which means you can run different agents on different providers in the same team.
The smartest setup is to use local models for routine, high-volume tasks and cloud models for work that demands top-tier output.
# Example hybrid team setup:
# Heartbeat agent (runs every 5 min, checks system health)
# → Use Ollama/Llama 3.2 (high volume, simple task, $0 cost)
# Research agent (summarizes articles, extracts data)
# → Use Ollama/Mistral (routine processing, no API cost)
# Content writer (creates blog posts, marketing copy)
# → Use Claude Sonnet (complex creative work, quality matters)
# Code reviewer (analyzes PRs, suggests improvements)
# → Use Ollama/CodeLlama (code-specific, runs locally)Each agent's SOUL.md can specify which model provider and model to use independently. This means your heartbeat agent running 288 times per day costs nothing, while your content writer uses Claude only when it has actual work to do.
Hardware Requirements
What you need depends on which model you plan to run. Here are the practical minimums.
Minimum: 8 GB RAM
Runs Phi-3 Mini and Gemma 2B. Suitable for simple classification, Q&A, and lightweight agent tasks. Most laptops from the last 5 years meet this requirement.
Recommended: 16 GB+ RAM
Runs Llama 3.2 8B, Mistral 7B, and CodeLlama 7B comfortably. This is the sweet spot for most OpenClaw setups. Apple Silicon Macs with 16 GB unified memory work exceptionally well since the GPU shares system RAM.
Power User: 32-64 GB RAM or Dedicated GPU
Required for large models like Llama 3.2 70B or Mixtral 8x7B. An NVIDIA GPU with 24+ GB VRAM dramatically speeds up inference. At this level, local quality approaches cloud model output.
Frequently Asked Questions
Can I run OpenClaw completely offline with Ollama?
Yes. Once Ollama has downloaded a model, both Ollama and OpenClaw run entirely on your machine with no internet connection required. This makes it ideal for air-gapped environments, sensitive data processing, and situations where you cannot send data to external APIs. The only time you need internet is for the initial model download and OpenClaw installation.
Which Ollama model works best with OpenClaw agents?
For most agent tasks, Llama 3.2 8B offers the best balance of quality and speed. It handles content writing, research summaries, and code review well on machines with 16 GB RAM. For development-focused agents, CodeLlama is purpose-built for code generation and review. If your machine has limited RAM (8 GB), Phi-3 Mini runs well and still produces useful output for simple tasks.
How much RAM do I need to run Ollama with OpenClaw?
A minimum of 8 GB RAM is needed for small models like Phi-3 Mini or Gemma 2B. For the recommended Llama 3.2 8B model, 16 GB RAM is ideal. Larger models like Llama 3.2 70B or Mixtral 8x7B need 32-64 GB RAM or a dedicated GPU. OpenClaw itself uses minimal resources. The memory requirement is almost entirely driven by the Ollama model size.
Is local Ollama as good as Claude or GPT-4 for OpenClaw agents?
For simple tasks like summarization, classification, and structured data extraction, local models perform comparably. For complex reasoning, creative writing, and nuanced instruction following, cloud models like Claude Sonnet and GPT-4 still outperform most local alternatives. The practical approach is to use Ollama for routine tasks and heartbeats while reserving cloud models for creative and complex work. This hybrid strategy gives you the best of both worlds.
Get 103 SOUL.md Templates Optimized for Any Model
Works with Ollama, Claude, OpenAI, and any provider OpenClaw supports. Each template includes pre-configured agent behavior, rules, and integrations ready to deploy.
Deploy a Ready-Made AI Agent
Skip the setup. Pick a template and deploy in 60 seconds.