RING 2 . RING 3 . RING 4 . RING 5 . RING 6

Agentic AI FinOps

Cap, log, and chargeback every autonomous agent loop before it spends six figures in an afternoon.

I.Why this matters

An agent loop is a budget tail risk. A single misbehaving agent can call tools and language models in a loop and rack up tens of thousands of dollars in hours. Unlike a human user, an agent does not get tired, does not check the bill, and does not stop. Agentic AI FinOps treats every agent run as a financial transaction with a hard ceiling, an audit trail, and an owner.

II.Principles

Every agent run gets a budget envelope. No envelope, no run.
Hard cap the loop. Step count, token count, dollar count, and wall clock. Whichever trips first wins.
Bind every agent action to an identity, an envelope, and a cost center. No floating runs.
Treat tool calls as paid actions. A web search, a database query, a RAG retrieval all carry cost; charge them back like model calls.
Sample, do not log everything raw. Full logs for sample 1 in N runs; metadata only for the rest. Agents produce too much exhaust to log naively.
Pre commit the eval. The agent is allowed to run only if a regression eval passes against last week.
Fail closed on budget breach. An agent that cannot pay does not get to call the model.

III.KPIs

Name	Target	Computation
Cost per agent run	p50 within budget; p99 within 5x budget	Sum of model and tool spend per run, attributed via run id.
Step count per run	p99 below the configured cap	Number of agent loop iterations per run id.
Tool call cost per run	Tracked and trending; alert on 3x deviation	Sum of priced tool calls (search, retrieval, code execution) per run id.
Budget breach rate	Below 1 percent of runs	Runs that hit the hard cap divided by total runs.
Agent eval regression flag rate	Below 2 percent week over week	Eval runs that scored below previous week minus 5 percent divided by total eval runs.
Cost per resolved task	Defined per agent product, trending down	Monthly agent spend per product divided by count of tasks the agent successfully completed.
Tool invocation success rate	Above 95 percent	Successful tool calls divided by total tool calls.

IV.The playbook spine

Adopt a run id contract. Every agent run, every model call inside it, every tool invocation share a single run id propagated through the gateway.
Define the envelope schema. Per run: max steps, max tokens, max dollars, max wall clock. Reject runs that do not declare an envelope.
Wrap every tool call in the same priced gateway. Search, retrieval, code exec, browser action; price them and log them like model calls.
Land per run rows in BigQuery. Cluster by agent product, partition by date. Build a per run cost view.
Set fail closed budget enforcement at the gateway. When the envelope is exhausted, the gateway returns an error; the agent cannot continue.
Build an eval harness. Run weekly against a fixed task set; store the score next to spend.
Run a monthly red team. Try to break out of the envelope. Log every escape route and patch.

V.Common failures

Treating the agent like a chat session. Chat budgets are per turn; agent budgets must be per run.
Soft caps. A warning at 80 percent that does not actually stop the loop is worse than no warning.
Logging every step verbatim. Storage and BigQuery costs explode within a week.
Allowing recursive agent calls without total budget. A planner that spawns five workers that each spawn five workers is a billing accident.
Pricing model calls but not tool calls. The web search bill caught up to the model bill within a month.
Treating eval as optional. Without eval the agent silently degrades; cost goes up while quality goes down.
Letting "the agent" be the owner. Every run needs a human cost center.

VI.Recommended tooling

Vendor neutral. For graded vendor comparisons see the Matrix.

Agent gateway and run id propagation

Tool call pricing and metering

Envelope enforcement and budget breach handler

Run trace storage with sampling

Eval harness

Red team and adversarial test framework

Per agent chargeback report

Policy as code for agent permissions

VII.Related IFO4 playbooks

ai-cost-per-inference . coming soon

VIII.FAQ

My agent only calls one model. Why do I need an envelope?

Because a loop bug can call that one model ten thousand times in an hour. The model is irrelevant; the loop is the risk.

What is a reasonable starting envelope?

For most internal automation, start at 25 steps, 200,000 tokens, and 5 dollars per run. Tighten or loosen by product after two weeks of data.

Should I let agents call other agents?

Only with a parent envelope that includes the children. Otherwise the recursion compounds and you lose the cap.

How do I price a self hosted retrieval call?

Compute fully loaded cost per call (compute plus storage plus query) and assign it to the call. The number does not need to be perfect; it needs to be present so engineers see the trade off.

Do I need a separate dashboard from AI FinOps?

Yes. The AI FinOps dashboard is per feature. The Agentic dashboard is per agent run id with step level and tool level breakdowns.

What about long lived background agents?

Treat them as a budget per day with a sliding window. If the daily envelope is breached, the agent enters a quiescent mode and pages the owner.

How do I show ROI on agent investment?

Cost per resolved task on the agent versus cost per resolved task before the agent (human or scripted). Both numbers must include all attributable cost.

IX.Further reading

Take this to your CFO

Compute your Score V2, assess your maturity, and prove the practice in lab.

Score V2 Maturity assessment Buy a Proven Lab attempt for this topic (PROV-AGENTIC-AI)

Name

Target

Computation

Cost per agent run

p50 within budget; p99 within 5x budget

Sum of model and tool spend per run, attributed via run id.

Step count per run

p99 below the configured cap

Number of agent loop iterations per run id.

Tool call cost per run

Tracked and trending; alert on 3x deviation

Sum of priced tool calls (search, retrieval, code execution) per run id.

Budget breach rate

Below 1 percent of runs

Runs that hit the hard cap divided by total runs.

Agent eval regression flag rate

Below 2 percent week over week

Eval runs that scored below previous week minus 5 percent divided by total eval runs.

Cost per resolved task

Defined per agent product, trending down

Monthly agent spend per product divided by count of tasks the agent successfully completed.

Tool invocation success rate

Above 95 percent

Successful tool calls divided by total tool calls.