
1.5 — AI vs. Traditional Cloud: What's Different
2 min readWhat's Different About AI Cost Governance
Traditional cloud FinOps is built around infrastructure — virtual machines, storage, databases, network. AI FinOps introduces a new layer: the cost of computation encoded in content.
| Dimension | Traditional Cloud | AI Workloads |
|---|---|---|
| Primary cost unit | CPU-hours, GB-hours, requests | Tokens (input + output) |
| What drives cost | Infrastructure decisions (instance size, storage class) | Content decisions (prompt length, response verbosity, conversation depth) |
| Who controls costs | Infrastructure and platform teams | Engineers, product managers, prompt engineers — anyone who touches prompts |
| Pricing model | Relatively stable, predictable tiers | Volatile: prices dropping ~10× per year (LLMflation); new model SKUs constantly |
| Idle cost | Significant (running but unused instances) | Minimal for API model; high for self-hosted (same as cloud) |
| Tagging and attribution | Mature tooling (AWS Cost Explorer, native tags) | Immature — shared API keys, non-standard units, limited vendor tooling |
| Forecasting | Trend analysis works well | Unreliable without understanding usage patterns AND price trajectory |
| Optimization levers | Right-sizing, Reserved Instances, Savings Plans | Prompt compression, model selection, caching, context windowing, quantization |
| Anomaly profile | Gradual drift, infrastructure scaling events | Sharp spikes from runaway loops, prompt changes, traffic events |
| Governance maturity | Well-established (FOCUS spec, native dashboards) | Emerging (FOCUS 1.2–1.3 adding AI support, tooling fragmented) |
What Transfers from Cloud FinOps
- Unit economics thinking (cost per unit of value delivered)
- Tagging and attribution discipline
- The Crawl-Walk-Run maturity model
- Showback and chargeback governance
- Budget alerts and anomaly detection concepts
- Cross-functional collaboration model (FinOps practitioner as bridge)
What Doesn't Transfer Directly
- Right-sizing has no equivalent — you don't pick an "instance size" for API calls; you pick a model and prompt strategy
- Reserved Instance savings logic doesn't apply to per-token billing (though Provisioned Throughput Units serve a similar role)
- Standard cost per request metrics ignore token volume, making comparisons misleading
- Tagging infrastructure at the API key level doesn't give you per-team or per-feature attribution without additional proxy or gateway tooling
The Practitioner's Mental Model Shift
In cloud FinOps, you ask: "What infrastructure are we running, and is it the right size?"
In AI FinOps, you ask: "What content are we processing, at what volume, with what model, through what architecture — and is every component justified by the value it delivers?"
Key Concepts
Content-Driven Costs
In AI FinOps, costs scale with content decisions like prompt length and response verbosity, not infrastructure decisions like instance sizes.
LLMflation
The rapid decline in AI model pricing — approximately 10x per year — making traditional trend-based forecasting unreliable without understanding price trajectory.
Provisioned Throughput Units (PTUs)
Reserved AI inference capacity with predictable billing, serving a similar economic role to Reserved Instances in traditional cloud.
GenAI FinOps vs. Cloud FinOps, FinOps Foundation Working Group — finops.org/wg/genai-finops-vs-cloud-finops/
The FinOps for AI exam tests this comparison directly. Know: (1) token vs. CPU-hour as cost units, (2) content decisions vs. infrastructure decisions as cost drivers, (3) why traditional right-sizing doesn't map to AI APIs, (4) what Provisioned Throughput Units replace in the AI context.