Token fragments flowing through a cost meter on a digital dashboard

1.2 — Tokens: The New Unit of Compute

4 min read

NovaSpark

"Why did the chatbot cost three times as much on Tuesday?" You pull up Team Alpha's usage logs. Tuesday was a normal traffic day — same number of users, same number of conversations. But the bill was $890. Monday was $310. The difference: on Tuesday, the product team ran a test. They changed the system prompt — the set of instructions sent to the model at the start of every conversation. The new system prompt was 2,400 words long instead of the usual 400 words. And because it went out with every single API call that day, those extra 2,000 words multiplied across 180,000 API calls. 180,000 calls × 2,000 extra words × ~1.33 tokens per word = 479 million extra input tokens. At $5 per million tokens: $2,395 in extra charges. From a text edit. This is what makes AI cost management different from everything you've done before. The unit of cost isn't a server-hour or a request. It's a token. And until you understand exactly what a token is and why it's priced the way it is, the bills will keep surprising you.

Tokens: The New Unit of Compute

What Is a Token?

A token is the basic unit of text that a language model processes. Not a word — a piece of a word, a word, or sometimes multiple short words together.

A rough rule of thumb: 1 token ≈ 0.75 words in English. More precisely:

"NovaSpark" → 2 tokens (Nova + Spark)
"the" → 1 token
"AI" → 1 token
"cost" → 1 token
"optimization" → 4 tokens (optim + ization + ... varies by model)
A typical business email (300 words) ≈ 400 tokens
A detailed system prompt (800 words) ≈ 1,066 tokens
A full legal contract (10,000 words) ≈ 13,333 tokens

Different models tokenize text slightly differently. OpenAI's models use the tiktoken tokenizer. Anthropic's Claude models use a different tokenizer. The 0.75 words-per-token ratio is a useful approximation, not an exact conversion.

Input Tokens vs. Output Tokens

Every API call has two parts:

Input tokens — everything you send to the model:

The system prompt (instructions for how the model should behave)
The conversation history (all previous messages in a multi-turn chat)
The current user message
Any documents or context you've injected (RAG retrieval results, file contents)

Output tokens — everything the model sends back:

The model's response

This distinction matters because output tokens cost more than input tokens — typically 3× to 8× more, depending on the model.

Why the premium? Generating each output token requires a full forward pass through the model. Reading input tokens is comparatively cheap (the model processes them in parallel). Writing output tokens is sequential — the model generates one token at a time, each dependent on the previous. That sequential computation is why providers charge a premium.

The Token Cost Formula

Cost = (Input Tokens / 1,000,000 × Input Price per 1M)
     + (Output Tokens / 1,000,000 × Output Price per 1M)

Current benchmark pricing (February 2026):

Model	Input (per 1M tokens)	Output (per 1M tokens)	Output/Input ratio
GPT-4o	$2.50	$10.00	4×
Claude 3.5 Sonnet	$3.00	$15.00	5×
Gemini 2.0 Flash	$0.10	$0.40	4×
GPT-4o mini	$0.15	$0.60	4×
Claude 3 Haiku	$0.25	$1.25	5×
Llama 3.1 70B (Bedrock)	$0.72	$0.72	1×
Llama 3.1 8B (Groq)	$0.05	$0.08	1.6×

Prices change frequently — always verify against provider documentation before forecasting.

A Worked Example

NovaSpark's support chatbot receives a customer message:

System prompt: 600 tokens
Conversation history (3 prior turns): 800 tokens
Current user message: 45 tokens
Total input: 1,445 tokens

The model responds:

Response: 220 tokens
Total output: 220 tokens

At GPT-4o pricing ($2.50 input / $10.00 output per 1M tokens):

Input cost:  1,445 / 1,000,000 × $2.50 = $0.0036
Output cost:   220 / 1,000,000 × $10.00 = $0.0022
Total per call: $0.0058

That feels tiny. But NovaSpark's chatbot handles 180,000 conversations per month:

$0.0058 × 180,000 = $1,044/month

Now the product team adds a richer, more detailed system prompt — 2,400 tokens instead of 600:

New input: 2,400 + 800 + 45 = 3,245 tokens
Input cost: 3,245 / 1,000,000 × $2.50 = $0.0081
Output cost unchanged: $0.0022
New total per call: $0.0103

$0.0103 × 180,000 = $1,854/month

An 800-token system prompt change → $810/month in extra costs. Multiply that across a larger chatbot or a higher-traffic product, and a single engineering commit can add tens of thousands to the monthly bill.

Why This Changes How You Think About Costs

In traditional cloud FinOps, cost scales with infrastructure decisions — instance sizes, storage tiers, network throughput. These are relatively stable and predictable.

In AI FinOps, cost scales with content decisions — what you put in your prompts, how long your conversations are, how verbose your model's responses are. A developer making what looks like a UX change (longer, more helpful responses) is simultaneously making a cost decision. Most developers don't know this yet. Your job is to help bridge that gap.

Key Concepts

Token

The basic unit of text a language model processes, roughly equivalent to 0.75 words in English, and the fundamental billing unit for API-based AI.

Input vs. Output Tokens

Input tokens are what you send to the model; output tokens are what it generates back, costing 3-8x more due to sequential computation.

Token Cost Formula

Cost = (Input Tokens / 1M x Input Price) + (Output Tokens / 1M x Output Price), applied per API call then multiplied by volume.

System Prompt

Instructions sent to the model at the start of every API call; changes to its length multiply across all calls, making it a major hidden cost lever.

FinOps Foundation Source

GenAI FinOps: How Token Pricing Really Works, FinOps Foundation Working Group — finops.org/wg/genai-finops-how-token-pricing-really-works/

Exam Tip

The token cost formula is almost certainly on the exam — both as a direct calculation question and embedded in scenario questions. Memorize: Cost = (Input / 1M × Input price) + (Output / 1M × Output price). Also know why output costs more than input (sequential generation vs. parallel processing).

Token Cost Calculator

Calculate the real cost of an API call. Adjust the inputs and watch the cost update in real time. Then try the challenge at the bottom.

System prompt length (words)400 words

503,000

Conversation history (words)600 words

04,000

User message (words)30 words

5500

Model response length (words)150 words

201,000

Model

Monthly call volume

calls/month

Cost per API call

$0.005432USD

Input tokens per call1,373

Output tokens per call200

Input cost per call$0.003432

Output cost per call$0.001999

Cost per call$0.005432

Monthly cost$271.60

Annual projection$3,259

At this volume, costs are manageable — but these patterns compound fast as usage grows.

🎯 Get the same answer for under $0.002 per call

Starting config: GPT-4o, 800-word system prompt, 600-word history, 30-word user message, 200-word response. That's currently $0.0087 per call — $435/month at 50,000 calls. Can you get it under $0.002 without changing the model or cutting the response below 100 words?

Hint: The biggest single lever is the one that multiplies across every call.

See Tokenization in Action

Watch how text breaks into tokens — the units you pay for. Compare how OpenAI and Anthropic tokenize the same text differently.

Pick an example to tokenize:

Or type your own text: