How to Track and Reduce OpenClaw Agent API Costs

Why AI Agents Burn Through Tokens

Before you can reduce costs, you need to understand where the money actually goes. AI agents consume tokens differently than a simple chat interface. A chat conversation is linear: you send a message, you get a response. An agent is autonomous. It thinks, plans, executes, evaluates, and sometimes retries. Each of those steps is a separate API call with its own token cost.

There are four main reasons agents burn more tokens than you expect.

Retry loops

When an agent fails a task, it often retries automatically. If your SOUL.md does not define a retry limit, the agent might attempt the same task 5-10 times before giving up. Each retry sends the full context window back to the LLM. A task that should cost $0.08 ends up costing $0.80. Multiply that by 50 tasks per day, and you are looking at an extra $36 per day in wasted retries.

Verbose system prompts

Your SOUL.md file is sent with every single API call. If your system prompt is 3,000 tokens, that is 3,000 input tokens charged on every request. An agent making 200 API calls per day with a 3,000-token system prompt consumes 600,000 input tokens just on the system prompt alone. At Claude Sonnet rates ($3/M input tokens), that is $1.80 per day just for the system prompt.

Context window stuffing

Agents that maintain conversation history send increasingly large payloads with each turn. By turn 20 of a session, the agent might be sending 15,000-20,000 tokens of history with every request. The LLM processes all of it, even if only the last message matters. This is the silent cost multiplier that catches most people off guard.

Using premium models for simple tasks

Not every task needs Claude Opus or GPT-4. An agent that checks server status, parses a log file, or sends a formatted notification does not need the most expensive model available. But if your agent is configured with a single model for all tasks, every simple task gets processed by an expensive model.

Real Cost Examples: What Agents Actually Cost

Let us look at real numbers. These are based on actual agent workloads running common tasks, calculated using published API pricing as of March 2026.

Agent Task	Model	Tokens/Task	Cost/Task	Daily (50 tasks)
Log analysis	Claude Opus	8,000	$0.24	$12.00
Log analysis	GPT-4o Mini	8,000	$0.003	$0.15
Content writing (500 words)	Claude Sonnet	4,500	$0.07	$3.50
Status check + notification	Claude Opus	2,000	$0.06	$3.00
Status check + notification	Llama 3 (Ollama)	2,000	$0.00	$0.00
Customer support reply	GPT-4o	3,500	$0.04	$2.00
Code review	Claude Opus	12,000	$0.36	$18.00

The pattern is clear. The same task can cost 80x more depending on which model you use. Log analysis with Claude Opus costs $12/day. The same task with GPT-4o Mini costs $0.15/day. Status checks with Claude Opus cost $3/day. With a local Ollama model, that cost is literally $0.

Now add retry loops to these numbers. If your log analysis agent retries failed parses 3 times on average, that $12/day becomes $36/day. Over a month, that is $1,080 on a single agent doing a single type of task. This is why cost tracking is not optional. It is essential.

How to Track Per-Session Costs

Tracking costs per session gives you visibility into exactly which agents, tasks, and time periods are consuming the most budget. Here is how to set it up.

Step 1: Log every API call

Every LLM API response includes token usage in the response headers or body. Claude returns usage.input_tokens and usage.output_tokens. OpenAI returns usage.prompt_tokens and usage.completion_tokens. Capture these values after every request.

Add a middleware layer between your agent and the LLM API that intercepts every response and extracts token counts. Store each record with a timestamp, agent name, session ID, task type, model used, input tokens, output tokens, and calculated cost. A simple JSON Lines file works for small setups. For production, use SQLite or PostgreSQL.

Step 2: Calculate cost per request

Maintain a pricing table in your configuration that maps model names to per-token costs. When you log a request, multiply the input tokens by the input price and the output tokens by the output price. Sum them for the total request cost. Keep this pricing table updated when providers change their rates.

For example, if Claude Sonnet charges $3 per million input tokens and $15 per million output tokens, a request with 2,000 input tokens and 1,500 output tokens costs: (2,000 / 1,000,000 * $3) + (1,500 / 1,000,000 * $15) = $0.006 + $0.0225 = $0.0285 per request.

Step 3: Aggregate by session and agent

Group your cost logs by session ID to see how much each conversation costs. Group by agent name to see which agents are the most expensive. Group by hour and day to spot usage patterns. Build a simple dashboard or script that generates daily cost reports broken down by agent, model, and task type.

A daily cost report should show you: total spend for the day, spend per agent, average cost per session, number of sessions, most expensive session (to spot outliers), and a comparison to the previous day. If yesterday your DevOps agent cost $4.20 and today it costs $18.50, something changed and you need to investigate.

Setting Up Cost Alerts

Cost tracking without alerts is like having a fire alarm that does not make noise. You need automated notifications when spending crosses thresholds.

Per-session alert

Set a maximum cost per session. For most agents, a single session should not exceed $1-2. If a session hits $5, something is wrong, likely a retry loop or a runaway conversation. Trigger an alert immediately when any session crosses your threshold. Send it to Telegram, Slack, or email so you can intervene.

Daily budget alert

Set a daily budget per agent. A DevOps monitoring agent might have a $5/day budget. A content writing agent might have a $10/day budget. When an agent hits 80% of its daily budget, send a warning. When it hits 100%, send a critical alert. This gives you time to react before costs spiral.

Anomaly detection

Track the 7-day rolling average cost per agent. If today's cost is more than 2x the rolling average, trigger an alert. This catches gradual cost creep that fixed thresholds miss. If your agent normally costs $3/day and suddenly costs $7/day, the anomaly alert fires even though $7 might be under your daily budget.

Monthly projection alert

At the end of each day, calculate the projected monthly cost based on the current daily run rate. If you are on day 10 and have spent $150, the projected monthly cost is $450. If your monthly budget is $300, the projection alert warns you early enough to make adjustments before the end of the month.

Budget Limits and Kill Switches

Alerts tell you when things go wrong. Kill switches make sure they stop going wrong. Without hard limits, a misbehaving agent can drain your API budget overnight.

Implement three levels of budget protection:

Level 1: Session kill switch

Set a hard maximum cost per session. When a session exceeds this limit, the agent stops processing and returns a pre-defined response like "Session budget exceeded. Please start a new session or contact the administrator." A reasonable session limit for most agents is $2-5. For code review or content generation agents that handle larger tasks, $10 might be appropriate.

Level 2: Daily kill switch

Set a hard daily budget per agent. When the cumulative daily spend reaches this limit, the agent goes offline until midnight UTC (or whatever reset time you configure). The agent should respond to any incoming messages with "Daily budget reached. Agent will resume at [time]." A DevOps agent might have a $10/day limit. A customer support agent might have a $20/day limit. Adjust based on your actual usage patterns after a few weeks of tracking.

Level 3: Monthly kill switch

This is your absolute ceiling. Set a monthly budget for all agents combined. When the total monthly spend hits this number, all agents go offline. This protects you from scenarios where multiple agents have issues simultaneously. If your comfortable monthly LLM budget is $200, set the kill switch at $200. No exceptions, no overrides unless you manually reset it.

These kill switches should live in your agent gateway configuration, not inside the agents themselves. The gateway checks the budget before forwarding any request to the LLM. If the budget is exceeded, the request never reaches the API, so you are never charged.

Strategy 1: Model Routing

Model routing is the single most effective cost reduction strategy. The idea is simple: use expensive models only when you need them, and cheap models for everything else.

Define task categories and assign a model to each one. Here is a practical routing table:

Task Category	Recommended Model	Why
Status checks, health pings	Ollama (local)	Zero cost, no reasoning needed
Log parsing, data extraction	GPT-4o Mini	Pattern matching, not creative thinking
Notifications, formatting	GPT-4o Mini / Ollama	Template-based, deterministic
Customer support replies	GPT-4o / Claude Sonnet	Needs nuance but not maximum intelligence
Content writing	Claude Sonnet	Good writing quality at mid-tier pricing
Complex analysis, strategy	Claude Opus / GPT-4	Only for tasks that genuinely need top-tier reasoning

In practice, model routing can reduce costs by 70-85%. If you run 200 agent tasks per day and 60% of them are simple tasks (status checks, log parsing, notifications), routing those to GPT-4o Mini or Ollama instead of Claude Opus saves you roughly $25-30 per day. That is $750-900 per month.

Implement routing in your agent gateway. Classify incoming tasks based on keywords, task type labels, or a simple classifier. Then forward the request to the appropriate model endpoint. The classification itself can use a cheap model or even rule-based logic.

Strategy 2: Response Caching

Many agent tasks produce identical or near-identical outputs for the same inputs. If your DevOps agent checks the same 5 services every hour and 4 of them return the same status, you are paying for 4 redundant API calls per check cycle.

Implement a response cache at the gateway level. Before sending a request to the LLM, hash the input (system prompt + user message). Check the cache for a matching hash. If found, return the cached response. If not, forward to the LLM and cache the response with a TTL (time to live).

Different task types need different TTLs. Status check responses can be cached for 5-10 minutes. FAQ-style customer support answers can be cached for 24 hours. Content generation should not be cached at all since you want unique outputs.

A well-tuned cache can eliminate 30-50% of API calls for agents that handle repetitive tasks. If your agent makes 200 API calls per day and caching eliminates 80 of them, you save 40% on that agent's daily cost. For a $10/day agent, that is $3,000 saved per year.

Strategy 3: Prompt Optimization

Your SOUL.md file is sent with every API call. Every word in it costs money. A bloated system prompt is a recurring tax on every interaction.

Start by measuring your current system prompt size. Count the tokens using a tokenizer. Then audit every line. Does the agent actually use this instruction? Is this rule triggered in practice? Are there examples that could be shortened?

Here are specific optimization techniques:

Remove redundant instructions

If your SOUL.md says 'Always respond in English' and also says 'Never use Turkish', the second instruction is redundant. Cut it. Review your rules for duplicates and overlaps. Most SOUL.md files have 15-20% redundancy that can be removed without changing behavior.

Use concise phrasing

Replace 'When the user asks you to perform a task, you should first analyze the task requirements and then proceed to execute the task step by step' with 'Analyze tasks before executing.' Same behavior, 80% fewer tokens. Every word in a system prompt is multiplied by hundreds of API calls per day.

Move examples to retrieval

If your system prompt includes 10 example outputs for reference, those examples are sent with every request even when they are not relevant. Instead, store examples separately and only inject relevant ones based on the current task. This can cut system prompt size by 40-60%.

Trim conversation history

Limit the conversation history sent with each request. Keep the last 5-10 messages instead of the entire session history. For most tasks, the agent only needs recent context. Implement a sliding window that drops older messages. This prevents context window costs from growing linearly with session length.

A typical SOUL.md optimization pass reduces token count by 30-50%. If your system prompt was 3,000 tokens and you cut it to 1,500, you save 1,500 input tokens per request. At 200 requests per day on Claude Sonnet, that saves 300,000 tokens per day, which is roughly $0.90/day or $27/month. It adds up.

Strategy 4: Local Models with Ollama

The nuclear option for cost reduction is running models locally. Ollama makes this straightforward. Install Ollama, pull a model, and point your agent at the local endpoint. API cost goes to $0.

Local models are not as capable as Claude Opus or GPT-4 for complex reasoning tasks. But they are more than adequate for a large category of agent work. Here is where local models shine:

Server health checks, log parsing with known patterns, formatting data into reports, sending templated notifications, extracting structured data from consistent formats, routing and classifying incoming requests, and simple Q&A with a predefined knowledge base. These tasks are pattern-based, not reasoning-heavy. A 7B parameter model running on a Mac Mini handles them comfortably.

The hardware cost is a one-time investment. A Mac Mini with 16GB RAM costs around $600 and can run a 7B model with good performance. A used workstation with a decent GPU can be found for $300-500. Compare that to $30/day in API costs for the same tasks, and the hardware pays for itself in 10-20 days.

The best approach is hybrid: use local models for routine tasks and cloud APIs for complex tasks that need premium model quality. CrewClaw supports this natively since each agent can be configured to use any LLM endpoint, whether it is a cloud API or a local Ollama instance.

Strategy 5: Retry Limits and Fallback Chains

Retry loops are the most common cause of unexpected cost spikes. An agent that retries a failed task 5 times with Claude Opus is burning money. Fix this with two mechanisms.

Retry limits

Set a maximum number of retries per task. Two retries is a reasonable default. After the second retry, the agent should log the failure, notify you, and move on. Do not let agents retry indefinitely. Add retry limits to your SOUL.md or gateway configuration.

Fallback chains

Instead of retrying with the same expensive model, implement a fallback chain. First attempt: Claude Sonnet. If it fails, retry with GPT-4o. If that fails, retry with GPT-4o Mini. Each step down the chain is cheaper. If the task genuinely cannot be completed, you spent less money discovering that.

A practical example: your content writing agent tries to generate a blog outline with Claude Sonnet at $0.07 per attempt. It fails because the topic is ambiguous. Without fallback chains, it retries 3 times with Sonnet: $0.21 total, still fails. With a fallback chain, it tries Sonnet ($0.07), fails, tries GPT-4o Mini ($0.003), which actually succeeds because the task just needed a simpler approach. Total cost: $0.073 instead of $0.21, and the task actually got completed.

Putting It All Together: A Cost-Optimized Agent Setup

Here is what a fully optimized agent deployment looks like with all strategies applied:

Component	Before Optimization	After Optimization
Model selection	Claude Opus for everything	Routed: Ollama / GPT-4o Mini / Sonnet / Opus
System prompt	3,000 tokens	1,500 tokens
Retry policy	Unlimited retries, same model	2 retries max, fallback chain
Caching	None	Response cache with TTL
Context history	Full session history	Last 5 messages (sliding window)
Cost tracking	None	Per-session logging + daily reports
Kill switches	None	Session / daily / monthly limits
Daily cost (5 agents, 50 tasks each)	$45-65	$5-12

The combined effect of all optimizations typically reduces costs by 75-85%. A setup that was costing $1,500-2,000/month drops to $150-360/month. The agents do the same work. The quality of complex tasks stays the same because those still use premium models. You just stop paying premium prices for tasks that do not need premium intelligence.

Why Self-Hosting Gives You Cost Control

Managed agent platforms charge a platform fee on top of the LLM cost. You pay for their infrastructure, their margins, and their convenience. When costs spike, you have limited ability to investigate or fix the root cause because you do not control the infrastructure.

Self-hosted agents give you complete visibility and control. You see every API call, every token count, every retry. You can add caching, implement model routing, set kill switches, and optimize prompts without waiting for a platform to add those features. You can switch LLM providers overnight if pricing changes. You can run local models for free.

CrewClaw's deploy packages are built for self-hosting. You get the SOUL.md configuration, Dockerfile, docker-compose, and bot scripts. Deploy on your own infrastructure, add your own cost tracking middleware, and implement the optimization strategies from this guide. Your agent costs become transparent and controllable instead of a black box.

Frequently Asked Questions

How much does it cost to run an OpenClaw agent per day?

It depends entirely on the model, task complexity, and number of sessions. A lightweight agent using GPT-4o Mini for 50 tasks per day costs roughly $0.50 to $2.00. The same workload on Claude Opus could cost $15 to $40 per day. Using a local model through Ollama costs $0 in API fees. The key is matching the model to the task difficulty.

Can I run OpenClaw agents with zero API cost?

Yes. If you use Ollama with a local model like Llama 3, Mistral, or Qwen, there are no API fees at all. You only pay for electricity and hardware. A Mac Mini with 16GB RAM can run a 7B parameter model comfortably. For many routine tasks like log parsing, status checks, and template responses, local models perform well enough.

What is the biggest cause of high OpenClaw agent costs?

Retry loops are the number one cost driver. When an agent fails a task and retries 3-5 times with the same expensive model, a $0.08 task becomes a $0.40 task. The second biggest cause is using a premium model like Claude Opus or GPT-4 for simple tasks that a cheaper model could handle. Model routing fixes both problems by assigning the right model to each task type.

How do I set up cost alerts for my OpenClaw agents?

Add a cost tracking middleware to your agent gateway that logs token counts and calculates cost per request. Store the data in a simple JSON file or database. Set threshold alerts using a cron job or a monitoring agent that checks cumulative daily spend. When the threshold is crossed, send a notification via Telegram, Slack, or email. You can also set hard kill switches that stop the agent entirely when a budget limit is reached.

Does CrewClaw include cost tracking in its deploy packages?

CrewClaw deploy packages include the agent configuration, Dockerfile, docker-compose, and bot scripts. Cost tracking is something you add at the infrastructure level since it depends on your LLM provider and monitoring preferences. However, the self-hosted nature of CrewClaw means you have full access to every API call, making it straightforward to add logging and cost calculation middleware.

Find out which agents your business needs

Free scan. Enter your URL, get an SEO analysis and a custom AI team recommendation in 30 seconds.

Scan Your Site Free Browse 187 Agent Templates