What Is a Token in AI? How Token Pricing Works (Plain English Guide)

Token pricing looks abstract until you map it to a real workflow. I use this page to keep the math visible: input, output, cached input, and the places where a small model can do the boring part first.

If you’re using AI APIs, or even just reading about AI pricing, you’ll see everything measured in “tokens.” But what actually is a token, and why should you care?

What Is a Token?

A token is the smallest unit of text that an AI model processes. Think of it as a piece of a word.

Examples:

“Hello” = 1 token
“artificial intelligence” = 2 tokens
“I love programming” = 3 tokens
“Pneumonoultramicroscopicsilicovolcanoconiosis” = 11 tokens (long words get split)

The rough rule: 1 token ≈ 4 characters of English text, or about ¾ of a word.

So 1,000 tokens ≈ 750 words, roughly one page of text.

Why Tokens Matter for Pricing

AI providers charge per token, not per word or per request. Every API call has:

Input tokens, what you send (your prompt, system instructions, conversation history)
Output tokens, what the model generates (the response)

These are priced separately, and output tokens almost always cost more than input tokens.

Example: Sending a Message to GPT-5.4

Let’s say you send a 500-word prompt and get a 200-word response:

Input: ~667 tokens × $2.50/1M = $0.0017
Output: ~267 tokens × $15.00/1M = $0.0040
Total: $0.0057 (less than a cent)

Sounds cheap, but at scale it adds up fast. A chatbot handling 100,000 conversations per day could cost thousands of dollars monthly.

Input vs. Output: Why Output Costs More

Output tokens are 3-5x more expensive than input tokens across all providers. Why?

When a model generates output, it’s doing much more computational work. Each new token requires processing all previous tokens. Input tokens are processed in parallel; output tokens are generated one at a time.

Provider	Input / 1M	Output / 1M	Output Premium
OpenAI GPT-5.4	$2.50	$15.00	6x
Anthropic Claude Opus 4.6	$5.00	$25.00	5x
Google Gemini 2.5 Pro	$1.25	$10.00	8x
DeepSeek V3.2	$0.28	$0.42	1.5x

Notice how DeepSeek bucks the trend with only a 1.5x premium. This is one reason it’s so popular for high-output workloads.

What Are Cached Input Tokens?

When you send the same system prompt or context repeatedly (like in a chatbot), providers can cache that input and charge you less:

Provider	Regular Input	Cached Input	Savings
OpenAI	$2.50	$0.25	90%
Anthropic	$5.00	$0.50	90%
Google	$1.25	$0.13	90%
DeepSeek	$0.28	$0.028	90%

Pro tip: Design your system prompts to be stable (don’t change them per request), and you’ll benefit from cached pricing automatically.

How to Count Tokens

You can estimate tokens using these rules:

English text: 1 token ≈ 4 characters ≈ 0.75 words
Code: tends to use more tokens per line (special characters, syntax)
Non-English languages: CJK (Chinese, Japanese, Korean) use more tokens per character
Numbers: each digit is often its own token

For exact counts, use:

OpenAI Tokenizer, try it free
Our token calculator, enter your token count and see costs across all models instantly

What Is a Context Window?

The context window is the maximum number of tokens a model can handle in a single request (input + output combined).

Model	Context Window
Llama 4 Scout	10M tokens
xAI Grok 4.20	2M tokens
GPT-4.1	1M tokens
Claude Opus 4.6	200K tokens
GPT-5.4	270K tokens

A larger context window means you can send more information in a single prompt, entire codebases, long documents, extended conversations. But more tokens = higher cost.

Token Pricing Tiers: What to Expect

Here’s what you’ll pay across the market in 2026:

Tier	Input / 1M	Output / 1M	Best For
Budget	$0.10-0.30	$0.15-0.60	High volume, simple tasks
Mid-range	$0.30-3.00	$1.00-15.00	Production workloads
Premium	$5.00-15.00	$25.00-75.00	Complex reasoning, coding

How to Reduce Your Token Costs

Choose the right model, don’t use Opus for tasks Haiku can handle
Minimize system prompts, shorter prompts = fewer input tokens
Use caching, reuse system prompts to get cached pricing
Batch processing, OpenAI’s Batch API gives 50% off
Set max_tokens, limit output length to avoid paying for text you don’t need
Summarize context, instead of sending full conversation history, summarize it

Try It Yourself

Use our free token calculator to estimate your costs across all 33 models from 7 providers. Enter your expected input and output tokens, and see exactly what you’ll pay.

Or browse the full pricing comparison to find the model that fits your budget. Ready to start building? Try OpenAI →, Try Claude →, or Try Gemini → (free tier). For a deeper breakdown of which API is right for you, see our best AI API for developers guide and cheapest AI API ranking.