How to Cut OpenClaw API Costs to $0.02 Per Query
Most people running OpenClaw agents overspend by 10x because they use one model for everything. This guide shows you how to route cheap models for decisions and expensive models only for generation. Real result: $0.40 per query drops to $0.024. $100/night becomes $15/month.
The $100/Night Mistake Most People Make
The default OpenClaw setup uses one model for everything. You install it, set your API key to Claude Sonnet, and every agent in your system uses Sonnet for every single call. Routing decisions, health checks, intent parsing, content generation, log analysis. All Sonnet. All expensive.
This is like hiring a senior engineer to answer the phone. The work gets done, but you are burning money on tasks that a junior employee handles just as well. A 5-agent team running Sonnet for everything can easily hit $3 to $5 per hour. Leave it running overnight and you wake up to a $100 bill for work that could have cost $2.
The Model Routing Strategy
The fix is simple: use cheap models for cheap tasks and expensive models only when you need them. In practice, this means splitting your agent work into two categories.
Routing / Decision Tasks (80% of calls)
Choosing which agent handles a message. Parsing user intent. Yes/no health checks. Log scanning. Status updates. These tasks are structured, predictable, and do not need advanced reasoning.
Use: Haiku ($0.002/query) or GPT-4o Mini ($0.003/query)
Generation Tasks (20% of calls)
Writing blog posts. Generating detailed reports. Complex multi-step analysis. Code generation. Creative work. These tasks require deeper reasoning and produce longer outputs.
Use: Sonnet ($0.02/query) or GPT-4o ($0.03/query)
When 80% of your calls cost $0.002 instead of $0.02, your average cost per query drops from $0.02 to $0.006. That is a 70% reduction before you even touch exit conditions or heartbeat optimization.
Setting Different Models Per Agent in SOUL.md
OpenClaw lets you specify the model for each agent directly in its SOUL.md file. The router agent gets a cheap model. The writer agent gets an expensive one. Here is the config for a cost-optimized 3-agent team.
Router Agent (Haiku, $0.002/query)
# ~/agents/router/SOUL.md
# Router Agent
## Model
provider: anthropic
model: claude-3-5-haiku-20241022
## Identity
You are a message router. Your only job is to read incoming
messages and decide which agent should handle them.
## Rules
- Respond with ONLY the agent name: writer, devops, or support
- If unclear, respond with "support" as the default
- Never generate long responses
- Never engage in conversation
## Routing Table
- Writing requests, blog, content, social media -> writer
- Server issues, deployments, monitoring, errors -> devops
- Questions, help, general inquiries -> supportWriter Agent (Sonnet, $0.02/query)
# ~/agents/writer/SOUL.md
# Content Writer Agent
## Model
provider: anthropic
model: claude-sonnet-4-20250514
## Identity
You are a skilled content writer who creates blog posts,
social media updates, and marketing copy.
## Rules
- Match the brand tone defined below
- Use short paragraphs (2-3 sentences max)
- Include specific numbers and examples
- No filler phrases or corporate jargon
## Tone
Direct, practical, slightly informal. Write like a founder
talking to another founder.DevOps Monitor (Haiku, $0.002/query)
# ~/agents/devops/SOUL.md
# DevOps Monitor Agent
## Model
provider: anthropic
model: claude-3-5-haiku-20241022
## Identity
You monitor server health and report issues.
## Rules
- Check: CPU, memory, disk, response time
- Only alert when thresholds are exceeded
- Keep status messages under 50 words
- Use structured format: [OK] or [ALERT] prefix
## Thresholds
- CPU > 85% for 5 minutes -> ALERT
- Memory > 90% -> ALERT
- Disk > 80% -> ALERT
- Response time > 2s -> ALERTKey insight: The router and devops agents handle 80%+ of all API calls but use Haiku at $0.002/query. Only the writer uses Sonnet, and it only fires when someone actually requests content. This is where the 95% savings come from.
Heartbeat Agents: Use Free or Near-Free Models
Heartbeat agents are the biggest hidden cost in OpenClaw setups. They run on a schedule (every 1 to 5 minutes), checking server health, monitoring APIs, or scanning logs. Each check is an API call. At 5-minute intervals, that is 288 calls per day, per agent.
Running a heartbeat on Sonnet: 288 calls x $0.02 = $5.76/day = $173/month. Running the same heartbeat on Haiku: 288 calls x $0.002 = $0.58/day = $17/month. Running it on a local model via Ollama: $0.00/day.
# ~/agents/heartbeat/SOUL.md
# Heartbeat Monitor
## Model
provider: ollama
model: gemma3:4b
## Identity
You are a system health monitor. You check metrics and
report status in a structured format.
## Rules
- Output ONLY: [OK] or [ALERT] followed by a one-line summary
- Never generate explanations unless asked
- If all metrics are healthy, respond with: [OK] All systems normal
- Parse the input data, do not make assumptions
## Schedule
interval: 5m
type: heartbeatGemma 3 (4B parameters) runs locally via Ollama on any machine with 8 GB of RAM. It handles structured health checks perfectly and costs nothing per query. For Raspberry Pi setups where local inference is too slow, use Claude Haiku as the cheapest cloud option.
# Install Ollama (Mac, Linux, or WSL)
curl -fsSL https://ollama.com/install.sh | sh
# Pull the Gemma 3 model (2.3 GB download)
ollama pull gemma3:4b
# Configure OpenClaw to use Ollama
openclaw models add ollama --endpoint http://localhost:11434
# Your heartbeat agent now runs for free
openclaw agent --agent heartbeat --message "CPU: 42%, MEM: 67%, DISK: 55%"
# Output: [OK] All systems normalExit Conditions: Stop Agents from Burning Tokens
The most expensive bug in any agent system is a loop. An agent gets stuck in a cycle, calling the AI model over and over without producing useful work. Without exit conditions, a looping Sonnet agent can burn $10 to $50 in a single hour.
Exit conditions are rules you add to your SOUL.md that tell the agent when to stop. They are the seatbelt of cost optimization.
# Add these to any agent's SOUL.md
## Exit Conditions
- Stop after completing the requested task
- Maximum 5 tool calls per message
- If the same action fails 3 times, stop and report the error
- Never retry a failed API call more than twice
- If no new information after 3 checks, pause until next trigger
- Maximum response length: 500 words (prevents token runaway)
## Cost Guards
- If the conversation exceeds 10 turns, summarize and close
- Do not research topics beyond the initial question
- Never browse the web unless explicitly asked
- Decline tasks outside your defined skillsLoop Prevention
Max 5 tool calls per message. If the same action fails 3 times, stop and report. Prevents runaway API spending.
Scope Limits
Agents only handle tasks within their defined skills. No rabbit holes. No open-ended research unless requested.
Conversation Caps
After 10 turns, summarize and close. Prevents context windows from growing endlessly and inflating token costs.
Cost Comparison: Before vs After Optimization
Here is what a typical 5-agent team costs before and after applying these optimizations. The numbers assume 500 queries per day across all agents.
| Agent | Before (all Sonnet) | After (optimized) |
|---|---|---|
| Router (200 calls/day) | $4.00/day (Sonnet) | $0.40/day (Haiku) |
| Writer (50 calls/day) | $1.00/day (Sonnet) | $1.00/day (Sonnet) |
| DevOps (100 calls/day) | $2.00/day (Sonnet) | $0.20/day (Haiku) |
| Heartbeat (288 calls/day) | $5.76/day (Sonnet) | $0.00/day (Gemma 3 local) |
| Support (100 calls/day) | $2.00/day (Sonnet) | $0.20/day (Haiku) |
| Daily total | $14.76/day | $1.80/day |
| Monthly total | $443/month | $54/month |
| Avg cost per query | $0.020 | $0.0024 |
That is an 88% reduction in monthly costs with zero loss in functionality. The writer still uses Sonnet for quality content. Everything else runs on cheaper models that handle the work just as well. Add exit conditions on top of this and you can push savings past 90%.
Quick Reference: Model Pricing Cheat Sheet
Use this table to pick the right model for each agent role. Prices are approximate per-query costs assuming typical agent message lengths (500 to 1000 tokens input, 200 to 500 tokens output).
| Model | Approx. Cost/Query | Best For |
|---|---|---|
| Gemma 3 (Ollama) | $0.000 | Heartbeats, health checks, simple parsing |
| Claude 3.5 Haiku | $0.002 | Routing, decisions, monitoring, support |
| GPT-4o Mini | $0.003 | Routing, classification, structured output |
| Claude Sonnet 4 | $0.020 | Writing, analysis, complex reasoning |
| GPT-4o | $0.030 | Code generation, detailed reports |
| Claude Opus 4 | $0.100 | Only for critical, high-stakes tasks |
Frequently Asked Questions
Does using a cheaper model like Haiku reduce agent quality?
Not for routing and decision tasks. Haiku excels at structured decisions like choosing which agent to invoke, parsing user intent, and yes/no checks. These tasks do not require the deep reasoning of Sonnet or Opus. In testing, Haiku handles routing with 98%+ accuracy while costing 20x less. You only lose quality if you use Haiku for complex generation tasks like long-form writing or multi-step analysis.
How does OpenClaw billing work with multiple models?
OpenClaw itself does not charge per query. You pay the AI provider directly (Anthropic, OpenAI, or Google) based on token usage. Each model has its own pricing. When you configure different models per agent in your SOUL.md, each agent's API calls are billed at that model's rate. Your total cost is the sum of all agent API calls. There is no markup or platform fee from OpenClaw.
Can I mix providers in the same agent team? For example, Haiku for one agent and GPT-4o Mini for another?
Yes. OpenClaw supports multiple providers simultaneously. You can set your router agent to use Claude Haiku, your writer agent to use Claude Sonnet, and your code agent to use GPT-4o. Each agent's SOUL.md specifies its own model and provider independently. You need API keys for each provider you use, configured via openclaw models auth paste-token.
What are exit conditions and why do they matter for cost?
Exit conditions are rules in your SOUL.md that tell the agent when to stop processing. Without them, agents can enter loops where they keep calling the AI model repeatedly, burning through tokens. A common example: an agent that monitors a metric, finds nothing wrong, but keeps checking every few seconds. Adding an exit condition like 'stop after 3 consecutive healthy checks' can prevent hundreds of unnecessary API calls per hour.
How much does a heartbeat agent cost per month?
A heartbeat agent that runs a health check every 5 minutes using Claude Haiku costs roughly $0.50 to $2.00 per month, depending on the complexity of each check. With a free local model like Gemma 3 via Ollama, the API cost drops to $0.00. The only cost is electricity, which is negligible. Compare this to running Sonnet for the same heartbeat: $15 to $40 per month for a task that does not need advanced reasoning.
Skip the Manual Config. Get Optimized Agents.
CrewClaw generates agent configs with optimized model routing out of the box. Each agent gets the right model for its role, exit conditions are built in, and heartbeat agents default to the cheapest viable option. Stop overpaying. Build your team in 60 seconds.