RING 1 . RING 2 . RING 3

Serverless Cost Engineering

Right size memory, kill cold starts, and chargeback Lambda, Cloud Run, and Functions to the request.

I.Why this matters

Serverless looked like the answer to capacity planning. The bill said otherwise. Memory oversizing, cold start padding, and missing concurrency caps quietly turn the "pay only for what you use" promise into "pay for what you guessed and never tuned". Serverless Cost Engineering installs the per request cost model and the right size loop that the platform forgot to build.

II.Principles

Memory is the lever. CPU and network scale with memory; right size memory and the rest follows.
Cold start is a cost decision, not a performance decision. Pre warming buys latency but is rarely worth its monthly cost.
Concurrency caps are a budget control. Without them a viral request can run up the bill in minutes.
Tag every function with cost-center, owner, and product. Untagged functions are unknown spend.
Cap function runtime. Sixty seconds is enough for most production work. Hours suggest the wrong tool.
Per request cost is the unit. Cost per invocation, then cost per business outcome.
Right size memory weekly. Workloads drift; one tuning pass at launch is not enough.

III.KPIs

Name	Target	Computation
Cost per million invocations per function	Tracked and trending down	Sum of GB seconds times rate plus invocation cost, divided by million invocations, per function per week.
Memory utilization p95	Between 60 and 85 percent	Reported peak memory used divided by configured memory, p95 over 7 days.
Cold start ratio	Below 5 percent for user facing functions	Cold start invocations divided by total invocations, per function.
Function timeout rate	Below 0.5 percent	Invocations that hit the configured timeout divided by total invocations.
Provisioned concurrency utilization	Above 70 percent for any function with provisioned concurrency	Average concurrent execution count divided by provisioned concurrency.
Untagged function ratio	Zero	Functions missing required tags divided by total functions, scanned daily.
Cost per business outcome	Defined per product, trending down	Monthly serverless spend per product divided by count of business outcomes that product drove.

IV.The playbook spine

Tag everything. Use SCP, IAM policy, or Terraform module to enforce cost-center, owner, and product on every function at deploy time.
Land invocation level metrics in BigQuery. From CloudWatch or Cloud Logging, ship per invocation: function id, duration, billed duration, memory used, cold start flag, status.
Right size memory with observed p95. Use the Lambda Power Tuning tool or a Cloud Run revision experiment. Set memory to 1.25x p95 used.
Cap concurrency per function. Default to a sensible per function reserved concurrency; escalate by exception.
Decide on provisioned concurrency per function. Compute monthly cost of the provisioned tier and compare to the cold start tax. Most internal functions do not need it.
Cap runtime. Set the function timeout to 1.5x p99 duration. Long timeouts hide hung calls that bill anyway.
Build a per function chargeback report. Monthly: cost, invocations, p95 memory, cold start ratio, timeout rate, owner.

V.Common failures

Setting memory to "the maximum, just to be safe". Doubles the bill, sometimes triples.
Adding provisioned concurrency to every function "for performance". The bill spikes; the latency improvement was not measurable for most.
Allowing 15 minute Lambda runtimes for batch work. Use a step function or a container; do not hide a long job inside a Lambda.
Leaving the default concurrency uncapped. One viral endpoint and the bill jumps.
Forgetting Lambda layers exist. Stale layers carry storage and bandwidth cost; clean them.
Not deduping logs. Every function logging at INFO with verbose payloads adds invisible cost in CloudWatch and Cloud Logging.
Treating Cloud Run as Lambda. Different concurrency model, different billing surface; right sizing recipes differ.

VI.Recommended tooling

Vendor neutral. For graded vendor comparisons see the Matrix.

Memory right sizing tool (Power Tuning class)

Per invocation cost log shipper

Concurrency cap policy enforcer

Cold start observability

Per function chargeback report

Function tag enforcer

Idle and orphaned function scanner

Multi cloud serverless cost normalizer

VII.Related IFO4 playbooks

tag-enforcement-at-provisioning . coming soon

VIII.FAQ

Lambda or Cloud Run for cost?

Cloud Run is generally cheaper for sustained loads above one request per second per instance because it bills concurrency rather than per invocation. Lambda is generally cheaper for very bursty, very small functions. The break even varies; test both for the workload.

Should I use Graviton or Arm?

Yes for most workloads. The 20 percent cost reduction is real and the migration is small for stateless code.

Are step functions worth it?

Yes for orchestrated workflows. They cost more per state transition than a single Lambda but they cap runaway retries and they make the cost auditable.

How do I handle cold start cost?

For user facing endpoints with visible latency impact, provision concurrency only for the top one or two functions. For everything else accept the cold start. Pre warming with a scheduled invocation rarely beats provisioned concurrency on cost.

Should I use Spot or preemptible for serverless?

Not directly, since the platform abstracts the underlying compute. The equivalent is moving cron and batch work to lower priced regions or to a container on Spot if the SLO allows.

How do I detect orphaned functions?

Scan for functions with zero invocations in the last 30 days, then for functions with no Terraform reference, then for functions whose owner left the company. All three should be removed.

How do I model fargate or Cloud Run jobs?

Treat them as a sibling category. The cost model is closer to Kubernetes (vCPU and memory hours) than to Lambda (GB seconds). The chargeback method is the same: per request and per outcome.

IX.Further reading

Take this to your CFO

Compute your Score V2, assess your maturity, and prove the practice in lab.

Score V2 Maturity assessment Buy a Proven Lab attempt for this topic (PROV-SERVERLESS)

Name

Target

Computation

Cost per million invocations per function

Tracked and trending down

Sum of GB seconds times rate plus invocation cost, divided by million invocations, per function per week.

Memory utilization p95

Between 60 and 85 percent

Reported peak memory used divided by configured memory, p95 over 7 days.

Cold start ratio

Below 5 percent for user facing functions

Cold start invocations divided by total invocations, per function.

Function timeout rate

Below 0.5 percent

Invocations that hit the configured timeout divided by total invocations.

Provisioned concurrency utilization

Above 70 percent for any function with provisioned concurrency

Average concurrent execution count divided by provisioned concurrency.

Untagged function ratio

Zero

Functions missing required tags divided by total functions, scanned daily.

Cost per business outcome

Defined per product, trending down

Monthly serverless spend per product divided by count of business outcomes that product drove.