Foundational

What Are We Even Paying For?

The Fundamentals of AI Cost Models

90 min5 sections10 quiz questions

Exam topics:AI Cost ModelsToken Billing MechanicsDeployment Infrastructure

NovaSpark

It's Monday morning, day three at NovaSpark. You've barely found the good coffee machine when Priya — VP Engineering, the person who hired you — drops a laptop on your desk and pulls up an AWS Billing console. The number at the top is $847,000. That's not the annual budget. That's last month. "AI spend," she says. "Up 340% from two months ago. Finance is asking questions. The board meeting is Friday." She slides a printed spreadsheet across the desk — four teams, four cost centers, all contributing to a single consolidated bill that looks like it was generated by three different companies using three different currencies. "I need to understand what we're paying for," Priya says. "Not the total. The mechanics. Where does this money actually go?" You look at the spreadsheet. Three lines catch your eye immediately. Line 1: "OpenAI API — gpt-4o — $214,440" Line 2: "AWS Bedrock — Llama 3.1 70B — $89,200" Line 3: "EC2 p4d.24xlarge (4× reserved) — $156,800" Same goal — run AI workloads — three completely different billing structures. This is where you start.

Start Module

Sections

1.1

🛡️

Complete the Knowledge Check to earn

Token Tracker

150 pts

Module 1 Knowledge Check

10 questions · Pass threshold: 70% · No time limit

0/10

Question 1

easy10 pts

Which of the following best describes a "token" in the context of LLM API billing?

Question 2

easy10 pts

NovaSpark is calling the OpenAI API and receives an invoice showing separate charges for "input tokens" and "output tokens." The output tokens are priced at 4× the input token rate. What is the primary reason output tokens are priced higher?

Question 3

medium20 pts

A customer support chatbot sends 50,000 API calls per day to GPT-4o. Each call includes an average of 800 input tokens and generates 150 output tokens. GPT-4o is priced at $2.50 per million input tokens and $10.00 per million output tokens. What is the approximate monthly API cost?

Question 4

medium20 pts

NovaSpark's chatbot conversation costs $0.004 at Turn 1 and $0.031 at Turn 10. No changes were made to the system prompt or response length. What phenomenon explains this cost increase?

Question 5

medium20 pts

A FinOps team discovers that their AI workload's true monthly cost is $380,000, but the AI API charges alone are only $200,000. Which of the following best explains the $180,000 difference?

Question 6

medium20 pts

NovaSpark runs the same AI model workload using three different approaches: (1) OpenAI API, (2) AWS Bedrock with Llama, (3) EC2 GPU instances running Llama. The workload processes 500,000 tokens per day. Which statement about their cost structures is most accurate?

Question 7

hard30 pts

A team is deciding between calling the OpenAI API from their AWS-hosted application versus using Azure OpenAI from an Azure-hosted deployment of the same application. The models, prompts, and token volumes are identical. Which FinOps consideration should most significantly influence this decision?

Question 8

hard30 pts

Which of the following scenarios would the FinOps Foundation classify as the Context Window Tax?

Question 9

hard30 pts

NovaSpark's engineering team proposes switching from a verbose system prompt (1,200 tokens) to a compressed version (280 tokens) for their customer support chatbot. The chatbot handles 200,000 conversations/month, averaging 6 turns each. Which calculation correctly estimates the monthly cost saving, assuming GPT-4o input pricing at $2.50/million tokens?

Question 10

hard30 pts

An organization is evaluating whether to migrate a high-volume text summarization workload from GPT-4o (at $2.50 input / $10.00 output per 1M tokens) to a self-hosted Llama 3.1 70B cluster on EC2. The workload processes 150 million tokens per day (70% input, 30% output). A suitable EC2 cluster costs $48,000/month fixed at 70% average GPU utilization. Which statement is most accurate?

0/10 answered

Sections

The Three Deployment Models

Tokens: The New Unit of Compute

The Context Window Tax

The Full Bill of Materials

AI vs. Traditional Cloud: What's Different

Module 1 Knowledge Check

Module 1 Knowledge Check