The Real Cost of an AI Agent: Beyond OpenAI Tokens
The token bill is the smallest line item. Here is the full cost model for a production AI agent — what to budget, what surprises teams, and where the real money goes.
Quick answer: A production AI agent's total cost of ownership in year one breaks down roughly as: 5-15% LLM inference, 5-10% retrieval and storage infrastructure, 10-20% observability and evals tooling, 50-70% engineering time (build + ongoing operations), and 10-20% organizational change management. Token costs are usually the smallest line. Teams that budget only for tokens end up cutting the project mid-flight.
The five cost categories, with realistic numbers
1. LLM inference (5-15% of total)
Token cost is what everyone fixates on. For a typical workflow agent handling 10,000 tasks/month with each task running 5,000 input tokens + 1,500 output tokens through Claude Sonnet 4.6 ($3/$15 per million), monthly inference cost is roughly:
- Input: 10,000 × 5,000 = 50M tokens × $3/M = $150
- Output: 10,000 × 1,500 = 15M tokens × $15/M = $225
- With prompt caching applied (typical 70% hit rate): ~$150/month total
For a customer-facing chat agent at higher volume (100K conversations/month) this can scale to $1,500-$5,000/month. For a heavyweight reasoning agent doing legal contract analysis at low volume (1,000/month with 50K tokens each through Claude Opus): $5,000-$10,000/month. None of these break the bank by themselves.
2. Retrieval and storage (5-10%)
If your agent uses RAG you need a vector database (Pinecone, Weaviate, Turbopuffer, pgvector), an embedding pipeline, document storage, and likely a re-ranker. Real numbers:
- Pinecone Standard: $70-300/month for typical agent corpus sizes
- Embedding compute (one-time per doc + delta): $50-200/month for moderate update rates
- Document storage: negligible (S3)
- Re-ranker (Cohere or self-hosted): $100-500/month
Total retrieval infra typically lands in the $200-1,500/month range.
3. Observability and evals (10-20%)
This is where teams under-budget the most. You need:
- LLM observability platform (LangSmith, Helicone, Braintrust, PostHog LLM Observability): $100-2,000/month depending on volume
- Eval infrastructure: open-source (free) plus engineering time to maintain the eval set
- Standard application observability: Sentry, Datadog, or equivalent ($50-500/month)
- Custom dashboards for the operating-function owner ($0 if you reuse existing tools)
Total: $300-3,000/month, growing with volume.
4. Engineering time (50-70%)
This is the bulk of the cost and the most often hand-waved. Realistic year-one engineering load for a single production agent:
- Initial build: 4-12 weeks of one engineer's time. At a fully-loaded $200K/year engineer that is $15K-$45K.
- Productionalization (guardrails, observability, integration, deployment): another 2-4 weeks. $7K-$15K.
- Ongoing operations (prompt iteration, eval maintenance, model updates, debugging production issues): 10-25% of one engineer's time, ongoing. $20K-$50K/year.
- On-call coverage: variable but real. If your agent is customer-facing, someone needs to wake up when it breaks.
Year-one engineering total: $40K-$100K+ for a single agent. Multi-agent or higher-stakes deployments scale up.
5. Organizational change management (10-20%)
Most teams skip this entirely and pay for it later. The agent has to be adopted. The operating function has to trust it. Existing process documentation has to be updated. Compliance and legal have to sign off. Customer support has to know how to escalate when the agent gets it wrong.
Realistic budget items:
- Adoption workshops and training for the operating team
- Documentation rewrite
- Compliance/legal/security review
- Customer comms (if customer-facing)
- Internal change-management lead time (someone owns this)
Budget $5K-$25K for a typical mid-market deployment, more for regulated industries.
Year-one TCO ranges we see in practice
| Agent type | Year-1 TCO range | Inference share |
|---|---|---|
| Internal-facing single workflow agent | $50K - $120K | 5-10% |
| Customer-facing chat or support agent | $100K - $300K | 10-20% |
| High-volume document processing agent | $80K - $250K | 10-25% |
| Multi-agent orchestrated system | $200K - $600K+ | 5-15% |
These are realistic for mid-market builds with a competent engineering team. Enterprise deployments with compliance overhead can be 2-3× higher. Bare-bones internal tools can be 50% lower.
Where the money actually saves
The point of building these agents isn't to minimize the cost of building them. It's to remove labor from a workflow that currently costs more. The math that matters is:
(Cost of human-hour the agent removes × hours/month it removes) − Year-one TCO.
An agent that removes 30 hours/month of $80/hour human work pays back $2,400/month, or $28,800/year. That's marginal for a $50K build. An agent that removes 200 hours/month of $120/hour work pays back $24K/month — a build that costs $200K returns its investment inside year one.
Run this math before scoping. Cheap agents on small workflows have terrible ROI. Big agents on big workflows have great ROI. The agent that "we'll figure out the volume later" has unknowable ROI and dies in budget review.
What we recommend
Before scoping any agent build, fill in three numbers on a spreadsheet: hours/month the agent removes, fully-loaded cost per hour, and target year-one ROI multiple. If those don't pencil, change the workflow you're targeting. We force this exercise on day one of every AI Kickstart. If you want a second pair of eyes on your TCO model before you commit budget, an hourly advisory call is the cheapest insurance you'll buy.
Frequently asked questions
How much does an AI agent cost to build?
A single internal-facing workflow agent typically lands in the $50K-$120K total cost of ownership range for year one. Customer-facing or high-volume agents run $100K-$300K. The build phase itself is usually $15K-$45K; the rest is operations, observability, and change management.
What does it cost to run an AI agent per month?
LLM inference for a moderate-volume workflow agent (10K tasks/month) runs $150-$500/month with prompt caching. Retrieval infrastructure adds $200-$1,500. Observability tooling is $300-$3,000. Engineering ongoing operations is the largest line: 10-25% of one engineer's time, or roughly $20K-$50K/year.
Why do teams underestimate AI agent costs?
Teams price the build like a feature ship and the run cost like a tokens line item. Both miss the real picture. The largest line items are engineering operations and change management — usually 60-80% of year-one TCO combined. Token cost is typically under 15%.
What ROI should I expect from an AI agent?
An agent that removes 200 hours/month of $120/hour work generates ~$24K/month in labor savings, or $288K/year. Against a $200K year-one TCO, that's a payback inside year one with strong ongoing returns. Smaller workflows (30 hours/month removed) struggle to pencil unless the cost-per-hour saved is high.
Can prompt caching meaningfully reduce inference cost?
Yes, by 60-80% in most cases. Anthropic, OpenAI, and Google all support prompt caching. For an agent with a stable system prompt and a large knowledge base in context, the cache hit rate is typically 70-90% after the warm-up period. We always design system prompts to maximize cacheable prefix.
Ready to ship an AI agent that actually works?
We embed with your team, build the agent, and ship it to production. Founder-led, no slide decks.