RING 2 . RING 3 . RING 4 . RING 5 . RING 6
Agentic AI FinOps
Cap, log, and chargeback every autonomous agent loop before it spends six figures in an afternoon.
I.Why this matters
An agent loop is a budget tail risk. A single misbehaving agent can call tools and language models in a loop and rack up tens of thousands of dollars in hours. Unlike a human user, an agent does not get tired, does not check the bill, and does not stop. Agentic AI FinOps treats every agent run as a financial transaction with a hard ceiling, an audit trail, and an owner.
II.Principles
- Every agent run gets a budget envelope. No envelope, no run.
- Hard cap the loop. Step count, token count, dollar count, and wall clock. Whichever trips first wins.
- Bind every agent action to an identity, an envelope, and a cost center. No floating runs.
- Treat tool calls as paid actions. A web search, a database query, a RAG retrieval all carry cost; charge them back like model calls.
- Sample, do not log everything raw. Full logs for sample 1 in N runs; metadata only for the rest. Agents produce too much exhaust to log naively.
- Pre commit the eval. The agent is allowed to run only if a regression eval passes against last week.
- Fail closed on budget breach. An agent that cannot pay does not get to call the model.
III.KPIs
IV.The playbook spine
- Adopt a run id contract. Every agent run, every model call inside it, every tool invocation share a single run id propagated through the gateway.
- Define the envelope schema. Per run: max steps, max tokens, max dollars, max wall clock. Reject runs that do not declare an envelope.
- Wrap every tool call in the same priced gateway. Search, retrieval, code exec, browser action; price them and log them like model calls.
- Land per run rows in BigQuery. Cluster by agent product, partition by date. Build a per run cost view.
- Set fail closed budget enforcement at the gateway. When the envelope is exhausted, the gateway returns an error; the agent cannot continue.
- Build an eval harness. Run weekly against a fixed task set; store the score next to spend.
- Run a monthly red team. Try to break out of the envelope. Log every escape route and patch.
V.Common failures
- Treating the agent like a chat session. Chat budgets are per turn; agent budgets must be per run.
- Soft caps. A warning at 80 percent that does not actually stop the loop is worse than no warning.
- Logging every step verbatim. Storage and BigQuery costs explode within a week.
- Allowing recursive agent calls without total budget. A planner that spawns five workers that each spawn five workers is a billing accident.
- Pricing model calls but not tool calls. The web search bill caught up to the model bill within a month.
- Treating eval as optional. Without eval the agent silently degrades; cost goes up while quality goes down.
- Letting "the agent" be the owner. Every run needs a human cost center.
VI.Recommended tooling
Vendor neutral. For graded vendor comparisons see the Matrix.
Agent gateway and run id propagation
Tool call pricing and metering
Envelope enforcement and budget breach handler
Run trace storage with sampling
Eval harness
Red team and adversarial test framework
Per agent chargeback report
Policy as code for agent permissions
VII.Related IFO4 playbooks
- ai-cost-per-inference . coming soon
VIII.FAQ
My agent only calls one model. Why do I need an envelope?
Because a loop bug can call that one model ten thousand times in an hour. The model is irrelevant; the loop is the risk.
What is a reasonable starting envelope?
For most internal automation, start at 25 steps, 200,000 tokens, and 5 dollars per run. Tighten or loosen by product after two weeks of data.
Should I let agents call other agents?
Only with a parent envelope that includes the children. Otherwise the recursion compounds and you lose the cap.
How do I price a self hosted retrieval call?
Compute fully loaded cost per call (compute plus storage plus query) and assign it to the call. The number does not need to be perfect; it needs to be present so engineers see the trade off.
Do I need a separate dashboard from AI FinOps?
Yes. The AI FinOps dashboard is per feature. The Agentic dashboard is per agent run id with step level and tool level breakdowns.
What about long lived background agents?
Treat them as a budget per day with a sliding window. If the daily envelope is breached, the agent enters a quiescent mode and pages the owner.
How do I show ROI on agent investment?
Cost per resolved task on the agent versus cost per resolved task before the agent (human or scripted). Both numbers must include all attributable cost.
Take this to your CFO