FinOps for AI
2 / 5
Token fragments flowing through a cost meter on a digital dashboard

1.2Tokens: The New Unit of Compute

4 min read
NovaSpark
"Why did the chatbot cost three times as much on Tuesday?" You pull up Team Alpha's usage logs. Tuesday was a normal traffic day — same number of users, same number of conversations. But the bill was $890. Monday was $310. The difference: on Tuesday, the product team ran a test. They changed the system prompt — the set of instructions sent to the model at the start of every conversation. The new system prompt was 2,400 words long instead of the usual 400 words. And because it went out with every single API call that day, those extra 2,000 words multiplied across 180,000 API calls. 180,000 calls × 2,000 extra words × ~1.33 tokens per word = 479 million extra input tokens. At $5 per million tokens: $2,395 in extra charges. From a text edit. This is what makes AI cost management different from everything you've done before. The unit of cost isn't a server-hour or a request. It's a token. And until you understand exactly what a token is and why it's priced the way it is, the bills will keep surprising you.

Tokens: The New Unit of Compute


What Is a Token?

A token is the basic unit of text that a language model processes. Not a word — a piece of a word, a word, or sometimes multiple short words together.

A rough rule of thumb: 1 token ≈ 0.75 words in English. More precisely:

  • "NovaSpark" → 2 tokens (Nova + Spark)
  • "the" → 1 token
  • "AI" → 1 token
  • "cost" → 1 token
  • "optimization" → 4 tokens (optim + ization + ... varies by model)
  • A typical business email (300 words) ≈ 400 tokens
  • A detailed system prompt (800 words) ≈ 1,066 tokens
  • A full legal contract (10,000 words) ≈ 13,333 tokens

Different models tokenize text slightly differently. OpenAI's models use the tiktoken tokenizer. Anthropic's Claude models use a different tokenizer. The 0.75 words-per-token ratio is a useful approximation, not an exact conversion.


Input Tokens vs. Output Tokens

Every API call has two parts:

Input tokens — everything you send to the model:

  • The system prompt (instructions for how the model should behave)
  • The conversation history (all previous messages in a multi-turn chat)
  • The current user message
  • Any documents or context you've injected (RAG retrieval results, file contents)

Output tokens — everything the model sends back:

  • The model's response

This distinction matters because output tokens cost more than input tokens — typically 3× to 8× more, depending on the model.

Why the premium? Generating each output token requires a full forward pass through the model. Reading input tokens is comparatively cheap (the model processes them in parallel). Writing output tokens is sequential — the model generates one token at a time, each dependent on the previous. That sequential computation is why providers charge a premium.


The Token Cost Formula

Cost = (Input Tokens / 1,000,000 × Input Price per 1M)
     + (Output Tokens / 1,000,000 × Output Price per 1M)

Current benchmark pricing (February 2026):

ModelInput (per 1M tokens)Output (per 1M tokens)Output/Input ratio
GPT-4o$2.50$10.00
Claude 3.5 Sonnet$3.00$15.00
Gemini 2.0 Flash$0.10$0.40
GPT-4o mini$0.15$0.60
Claude 3 Haiku$0.25$1.25
Llama 3.1 70B (Bedrock)$0.72$0.72
Llama 3.1 8B (Groq)$0.05$0.081.6×

Prices change frequently — always verify against provider documentation before forecasting.


A Worked Example

NovaSpark's support chatbot receives a customer message:

  • System prompt: 600 tokens
  • Conversation history (3 prior turns): 800 tokens
  • Current user message: 45 tokens
  • Total input: 1,445 tokens

The model responds:

  • Response: 220 tokens
  • Total output: 220 tokens

At GPT-4o pricing ($2.50 input / $10.00 output per 1M tokens):

Input cost:  1,445 / 1,000,000 × $2.50 = $0.0036
Output cost:   220 / 1,000,000 × $10.00 = $0.0022
Total per call: $0.0058

That feels tiny. But NovaSpark's chatbot handles 180,000 conversations per month:

$0.0058 × 180,000 = $1,044/month

Now the product team adds a richer, more detailed system prompt — 2,400 tokens instead of 600:

New input: 2,400 + 800 + 45 = 3,245 tokens
Input cost: 3,245 / 1,000,000 × $2.50 = $0.0081
Output cost unchanged: $0.0022
New total per call: $0.0103

$0.0103 × 180,000 = $1,854/month

An 800-token system prompt change → $810/month in extra costs. Multiply that across a larger chatbot or a higher-traffic product, and a single engineering commit can add tens of thousands to the monthly bill.


Why This Changes How You Think About Costs

In traditional cloud FinOps, cost scales with infrastructure decisions — instance sizes, storage tiers, network throughput. These are relatively stable and predictable.

In AI FinOps, cost scales with content decisions — what you put in your prompts, how long your conversations are, how verbose your model's responses are. A developer making what looks like a UX change (longer, more helpful responses) is simultaneously making a cost decision. Most developers don't know this yet. Your job is to help bridge that gap.

Token billing flow from user input through tokenizer to cost calculation
How token-based billing works: from input to invoice

Key Concepts

Token

The basic unit of text a language model processes, roughly equivalent to 0.75 words in English, and the fundamental billing unit for API-based AI.

Input vs. Output Tokens

Input tokens are what you send to the model; output tokens are what it generates back, costing 3-8x more due to sequential computation.

Token Cost Formula

Cost = (Input Tokens / 1M x Input Price) + (Output Tokens / 1M x Output Price), applied per API call then multiplied by volume.

System Prompt

Instructions sent to the model at the start of every API call; changes to its length multiply across all calls, making it a major hidden cost lever.

FinOps Foundation Source

GenAI FinOps: How Token Pricing Really Works, FinOps Foundation Working Group finops.org/wg/genai-finops-how-token-pricing-really-works/

Exam Tip

The token cost formula is almost certainly on the exam — both as a direct calculation question and embedded in scenario questions. Memorize: Cost = (Input / 1M × Input price) + (Output / 1M × Output price). Also know why output costs more than input (sequential generation vs. parallel processing).