RING 1 . RING 2 . RING 3
Serverless Cost Engineering
Right size memory, kill cold starts, and chargeback Lambda, Cloud Run, and Functions to the request.
I.Why this matters
Serverless looked like the answer to capacity planning. The bill said otherwise. Memory oversizing, cold start padding, and missing concurrency caps quietly turn the "pay only for what you use" promise into "pay for what you guessed and never tuned". Serverless Cost Engineering installs the per request cost model and the right size loop that the platform forgot to build.
II.Principles
- Memory is the lever. CPU and network scale with memory; right size memory and the rest follows.
- Cold start is a cost decision, not a performance decision. Pre warming buys latency but is rarely worth its monthly cost.
- Concurrency caps are a budget control. Without them a viral request can run up the bill in minutes.
- Tag every function with cost-center, owner, and product. Untagged functions are unknown spend.
- Cap function runtime. Sixty seconds is enough for most production work. Hours suggest the wrong tool.
- Per request cost is the unit. Cost per invocation, then cost per business outcome.
- Right size memory weekly. Workloads drift; one tuning pass at launch is not enough.
III.KPIs
IV.The playbook spine
- Tag everything. Use SCP, IAM policy, or Terraform module to enforce cost-center, owner, and product on every function at deploy time.
- Land invocation level metrics in BigQuery. From CloudWatch or Cloud Logging, ship per invocation: function id, duration, billed duration, memory used, cold start flag, status.
- Right size memory with observed p95. Use the Lambda Power Tuning tool or a Cloud Run revision experiment. Set memory to 1.25x p95 used.
- Cap concurrency per function. Default to a sensible per function reserved concurrency; escalate by exception.
- Decide on provisioned concurrency per function. Compute monthly cost of the provisioned tier and compare to the cold start tax. Most internal functions do not need it.
- Cap runtime. Set the function timeout to 1.5x p99 duration. Long timeouts hide hung calls that bill anyway.
- Build a per function chargeback report. Monthly: cost, invocations, p95 memory, cold start ratio, timeout rate, owner.
V.Common failures
- Setting memory to "the maximum, just to be safe". Doubles the bill, sometimes triples.
- Adding provisioned concurrency to every function "for performance". The bill spikes; the latency improvement was not measurable for most.
- Allowing 15 minute Lambda runtimes for batch work. Use a step function or a container; do not hide a long job inside a Lambda.
- Leaving the default concurrency uncapped. One viral endpoint and the bill jumps.
- Forgetting Lambda layers exist. Stale layers carry storage and bandwidth cost; clean them.
- Not deduping logs. Every function logging at INFO with verbose payloads adds invisible cost in CloudWatch and Cloud Logging.
- Treating Cloud Run as Lambda. Different concurrency model, different billing surface; right sizing recipes differ.
VI.Recommended tooling
Vendor neutral. For graded vendor comparisons see the Matrix.
Memory right sizing tool (Power Tuning class)
Per invocation cost log shipper
Concurrency cap policy enforcer
Cold start observability
Per function chargeback report
Function tag enforcer
Idle and orphaned function scanner
Multi cloud serverless cost normalizer
VII.Related IFO4 playbooks
- tag-enforcement-at-provisioning . coming soon
VIII.FAQ
Lambda or Cloud Run for cost?
Cloud Run is generally cheaper for sustained loads above one request per second per instance because it bills concurrency rather than per invocation. Lambda is generally cheaper for very bursty, very small functions. The break even varies; test both for the workload.
Should I use Graviton or Arm?
Yes for most workloads. The 20 percent cost reduction is real and the migration is small for stateless code.
Are step functions worth it?
Yes for orchestrated workflows. They cost more per state transition than a single Lambda but they cap runaway retries and they make the cost auditable.
How do I handle cold start cost?
For user facing endpoints with visible latency impact, provision concurrency only for the top one or two functions. For everything else accept the cold start. Pre warming with a scheduled invocation rarely beats provisioned concurrency on cost.
Should I use Spot or preemptible for serverless?
Not directly, since the platform abstracts the underlying compute. The equivalent is moving cron and batch work to lower priced regions or to a container on Spot if the SLO allows.
How do I detect orphaned functions?
Scan for functions with zero invocations in the last 30 days, then for functions with no Terraform reference, then for functions whose owner left the company. All three should be removed.
How do I model fargate or Cloud Run jobs?
Treat them as a sibling category. The cost model is closer to Kubernetes (vCPU and memory hours) than to Lambda (GB seconds). The chargeback method is the same: per request and per outcome.
Take this to your CFO