How to Cut OpenClaw API Costs to $0.02 Per Query (2026)

The $100/Night Mistake Most People Make

The default OpenClaw setup uses one model for everything. You install it, set your API key to Claude Sonnet, and every agent in your system uses Sonnet for every single call. Routing decisions, health checks, intent parsing, content generation, log analysis. All Sonnet. All expensive.

This is like hiring a senior engineer to answer the phone. The work gets done, but you are burning money on tasks that a junior employee handles just as well. A 5-agent team running Sonnet for everything can easily hit $3 to $5 per hour. Leave it running overnight and you wake up to a $100 bill for work that could have cost $2.

95%

cost reduction possible

$0.024

per query after optimization

$15/mo

optimized 5-agent team

code changes needed

The Model Routing Strategy

The fix is simple: use cheap models for cheap tasks and expensive models only when you need them. In practice, this means splitting your agent work into two categories.

Routing / Decision Tasks (80% of calls)

Choosing which agent handles a message. Parsing user intent. Yes/no health checks. Log scanning. Status updates. These tasks are structured, predictable, and do not need advanced reasoning.

Use: Haiku ($0.002/query) or GPT-4o Mini ($0.003/query)

Generation Tasks (20% of calls)

Writing blog posts. Generating detailed reports. Complex multi-step analysis. Code generation. Creative work. These tasks require deeper reasoning and produce longer outputs.

Use: Sonnet ($0.02/query) or GPT-4o ($0.03/query)

When 80% of your calls cost $0.002 instead of $0.02, your average cost per query drops from $0.02 to $0.006. That is a 70% reduction before you even touch exit conditions or heartbeat optimization.

Setting Different Models Per Agent in SOUL.md

OpenClaw lets you specify the model for each agent directly in its SOUL.md file. The router agent gets a cheap model. The writer agent gets an expensive one. Here is the config for a cost-optimized 3-agent team.

Router Agent (Haiku, $0.002/query)

# ~/agents/router/SOUL.md

# Router Agent

## Model
provider: anthropic
model: claude-3-5-haiku-20241022

## Identity
You are a message router. Your only job is to read incoming
messages and decide which agent should handle them.

## Rules
- Respond with ONLY the agent name: writer, devops, or support
- If unclear, respond with "support" as the default
- Never generate long responses
- Never engage in conversation

## Routing Table
- Writing requests, blog, content, social media -> writer
- Server issues, deployments, monitoring, errors -> devops
- Questions, help, general inquiries -> support

Writer Agent (Sonnet, $0.02/query)

# ~/agents/writer/SOUL.md

# Content Writer Agent

## Model
provider: anthropic
model: claude-sonnet-4-20250514

## Identity
You are a skilled content writer who creates blog posts,
social media updates, and marketing copy.

## Rules
- Match the brand tone defined below
- Use short paragraphs (2-3 sentences max)
- Include specific numbers and examples
- No filler phrases or corporate jargon

## Tone
Direct, practical, slightly informal. Write like a founder
talking to another founder.

DevOps Monitor (Haiku, $0.002/query)

# ~/agents/devops/SOUL.md

# DevOps Monitor Agent

## Model
provider: anthropic
model: claude-3-5-haiku-20241022

## Identity
You monitor server health and report issues.

## Rules
- Check: CPU, memory, disk, response time
- Only alert when thresholds are exceeded
- Keep status messages under 50 words
- Use structured format: [OK] or [ALERT] prefix

## Thresholds
- CPU > 85% for 5 minutes -> ALERT
- Memory > 90% -> ALERT
- Disk > 80% -> ALERT
- Response time > 2s -> ALERT

Key insight: The router and devops agents handle 80%+ of all API calls but use Haiku at $0.002/query. Only the writer uses Sonnet, and it only fires when someone actually requests content. This is where the 95% savings come from.

Heartbeat Agents: Use Free or Near-Free Models

Heartbeat agents are the biggest hidden cost in OpenClaw setups. They run on a schedule (every 1 to 5 minutes), checking server health, monitoring APIs, or scanning logs. Each check is an API call. At 5-minute intervals, that is 288 calls per day, per agent.

Running a heartbeat on Sonnet: 288 calls x $0.02 = $5.76/day = $173/month. Running the same heartbeat on Haiku: 288 calls x $0.002 = $0.58/day = $17/month. Running it on a local model via Ollama: $0.00/day.

# ~/agents/heartbeat/SOUL.md

# Heartbeat Monitor

## Model
provider: ollama
model: gemma3:4b

## Identity
You are a system health monitor. You check metrics and
report status in a structured format.

## Rules
- Output ONLY: [OK] or [ALERT] followed by a one-line summary
- Never generate explanations unless asked
- If all metrics are healthy, respond with: [OK] All systems normal
- Parse the input data, do not make assumptions

## Schedule
interval: 5m
type: heartbeat

Gemma 3 (4B parameters) runs locally via Ollama on any machine with 8 GB of RAM. It handles structured health checks perfectly and costs nothing per query. For Raspberry Pi setups where local inference is too slow, use Claude Haiku as the cheapest cloud option.

# Install Ollama (Mac, Linux, or WSL)
curl -fsSL https://ollama.com/install.sh | sh

# Pull the Gemma 3 model (2.3 GB download)
ollama pull gemma3:4b

# Configure OpenClaw to use Ollama
openclaw models add ollama --endpoint http://localhost:11434

# Your heartbeat agent now runs for free
openclaw agent --agent heartbeat --message "CPU: 42%, MEM: 67%, DISK: 55%"
# Output: [OK] All systems normal

Exit Conditions: Stop Agents from Burning Tokens

The most expensive bug in any agent system is a loop. An agent gets stuck in a cycle, calling the AI model over and over without producing useful work. Without exit conditions, a looping Sonnet agent can burn $10 to $50 in a single hour.

Exit conditions are rules you add to your SOUL.md that tell the agent when to stop. They are the seatbelt of cost optimization.

# Add these to any agent's SOUL.md

## Exit Conditions
- Stop after completing the requested task
- Maximum 5 tool calls per message
- If the same action fails 3 times, stop and report the error
- Never retry a failed API call more than twice
- If no new information after 3 checks, pause until next trigger
- Maximum response length: 500 words (prevents token runaway)

## Cost Guards
- If the conversation exceeds 10 turns, summarize and close
- Do not research topics beyond the initial question
- Never browse the web unless explicitly asked
- Decline tasks outside your defined skills

Loop Prevention

Max 5 tool calls per message. If the same action fails 3 times, stop and report. Prevents runaway API spending.

Scope Limits

Agents only handle tasks within their defined skills. No rabbit holes. No open-ended research unless requested.

Conversation Caps

After 10 turns, summarize and close. Prevents context windows from growing endlessly and inflating token costs.

Cost Comparison: Before vs After Optimization

Here is what a typical 5-agent team costs before and after applying these optimizations. The numbers assume 500 queries per day across all agents.

Agent	Before (all Sonnet)	After (optimized)
Router (200 calls/day)	$4.00/day (Sonnet)	$0.40/day (Haiku)
Writer (50 calls/day)	$1.00/day (Sonnet)	$1.00/day (Sonnet)
DevOps (100 calls/day)	$2.00/day (Sonnet)	$0.20/day (Haiku)
Heartbeat (288 calls/day)	$5.76/day (Sonnet)	$0.00/day (Gemma 3 local)
Support (100 calls/day)	$2.00/day (Sonnet)	$0.20/day (Haiku)
Daily total	$14.76/day	$1.80/day
Monthly total	$443/month	$54/month
Avg cost per query	$0.020	$0.0024

That is an 88% reduction in monthly costs with zero loss in functionality. The writer still uses Sonnet for quality content. Everything else runs on cheaper models that handle the work just as well. Add exit conditions on top of this and you can push savings past 90%.

Quick Reference: Model Pricing Cheat Sheet

Use this table to pick the right model for each agent role. Prices are approximate per-query costs assuming typical agent message lengths (500 to 1000 tokens input, 200 to 500 tokens output).

Model	Approx. Cost/Query	Best For
Gemma 3 (Ollama)	$0.000	Heartbeats, health checks, simple parsing
Claude 3.5 Haiku	$0.002	Routing, decisions, monitoring, support
GPT-4o Mini	$0.003	Routing, classification, structured output
Claude Sonnet 4	$0.020	Writing, analysis, complex reasoning
GPT-4o	$0.030	Code generation, detailed reports
Claude Opus 4	$0.100	Only for critical, high-stakes tasks

Frequently Asked Questions

Does using a cheaper model like Haiku reduce agent quality?

Not for routing and decision tasks. Haiku excels at structured decisions like choosing which agent to invoke, parsing user intent, and yes/no checks. These tasks do not require the deep reasoning of Sonnet or Opus. In testing, Haiku handles routing with 98%+ accuracy while costing 20x less. You only lose quality if you use Haiku for complex generation tasks like long-form writing or multi-step analysis.

How does OpenClaw billing work with multiple models?

OpenClaw itself does not charge per query. You pay the AI provider directly (Anthropic, OpenAI, or Google) based on token usage. Each model has its own pricing. When you configure different models per agent in your SOUL.md, each agent's API calls are billed at that model's rate. Your total cost is the sum of all agent API calls. There is no markup or platform fee from OpenClaw.

Can I mix providers in the same agent team? For example, Haiku for one agent and GPT-4o Mini for another?

Yes. OpenClaw supports multiple providers simultaneously. You can set your router agent to use Claude Haiku, your writer agent to use Claude Sonnet, and your code agent to use GPT-4o. Each agent's SOUL.md specifies its own model and provider independently. You need API keys for each provider you use, configured via openclaw models auth paste-token.

What are exit conditions and why do they matter for cost?

Exit conditions are rules in your SOUL.md that tell the agent when to stop processing. Without them, agents can enter loops where they keep calling the AI model repeatedly, burning through tokens. A common example: an agent that monitors a metric, finds nothing wrong, but keeps checking every few seconds. Adding an exit condition like 'stop after 3 consecutive healthy checks' can prevent hundreds of unnecessary API calls per hour.

How much does a heartbeat agent cost per month?

A heartbeat agent that runs a health check every 5 minutes using Claude Haiku costs roughly $0.50 to $2.00 per month, depending on the complexity of each check. With a free local model like Gemma 3 via Ollama, the API cost drops to $0.00. The only cost is electricity, which is negligible. Compare this to running Sonnet for the same heartbeat: $15 to $40 per month for a task that does not need advanced reasoning.

Skip the Manual Config. Get Optimized Agents.

CrewClaw generates agent configs with optimized model routing out of the box. Each agent gets the right model for its role, exit conditions are built in, and heartbeat agents default to the cheapest viable option. Stop overpaying. Build your team in 60 seconds.

Full Cost Guide Build Your Agent

How to Cut OpenClaw API Costs to $0.02 Per Query