What Is a Token in AI? How Token Pricing Works (Plain English Guide)
Learn what AI tokens are, how they're counted, and why they matter for pricing. A simple guide for anyone using AI APIs like ChatGPT, Claude, or Gemini.
By AI Pricing Guru Editorial Team
AI Pricing Guru articles are maintained by the editorial workflow behind the site: daily pricing snapshots, provider source checks, and review passes for model launches, subscription limits, and billing changes.
Token pricing looks abstract until you map it to a real workflow. I use this page to keep the math visible: input, output, cached input, and the places where a small model can do the boring part first.
If you’re using AI APIs, or even just reading about AI pricing, you’ll see everything measured in “tokens.” But what actually is a token, and why should you care?
What Is a Token?
A token is the smallest unit of text that an AI model processes. Think of it as a piece of a word.
Examples:
- “Hello” = 1 token
- “artificial intelligence” = 2 tokens
- “I love programming” = 3 tokens
- “Pneumonoultramicroscopicsilicovolcanoconiosis” = 11 tokens (long words get split)
The rough rule: 1 token ≈ 4 characters of English text, or about ¾ of a word.
So 1,000 tokens ≈ 750 words, roughly one page of text.
Why Tokens Matter for Pricing
AI providers charge per token, not per word or per request. Every API call has:
- Input tokens, what you send (your prompt, system instructions, conversation history)
- Output tokens, what the model generates (the response)
These are priced separately, and output tokens almost always cost more than input tokens.
Example: Sending a Message to GPT-5.4
Let’s say you send a 500-word prompt and get a 200-word response:
- Input: ~667 tokens × $2.50/1M = $0.0017
- Output: ~267 tokens × $15.00/1M = $0.0040
- Total: $0.0057 (less than a cent)
Sounds cheap, but at scale it adds up fast. A chatbot handling 100,000 conversations per day could cost thousands of dollars monthly.
Input vs. Output: Why Output Costs More
Output tokens are 3-5x more expensive than input tokens across all providers. Why?
When a model generates output, it’s doing much more computational work. Each new token requires processing all previous tokens. Input tokens are processed in parallel; output tokens are generated one at a time.
| Provider | Input / 1M | Output / 1M | Output Premium |
|---|---|---|---|
| OpenAI GPT-5.4 | $2.50 | $15.00 | 6x |
| Anthropic Claude Opus 4.6 | $5.00 | $25.00 | 5x |
| Google Gemini 2.5 Pro | $1.25 | $10.00 | 8x |
| DeepSeek V3.2 | $0.28 | $0.42 | 1.5x |
Notice how DeepSeek bucks the trend with only a 1.5x premium. This is one reason it’s so popular for high-output workloads.
What Are Cached Input Tokens?
When you send the same system prompt or context repeatedly (like in a chatbot), providers can cache that input and charge you less:
| Provider | Regular Input | Cached Input | Savings |
|---|---|---|---|
| OpenAI | $2.50 | $0.25 | 90% |
| Anthropic | $5.00 | $0.50 | 90% |
| $1.25 | $0.13 | 90% | |
| DeepSeek | $0.28 | $0.028 | 90% |
Pro tip: Design your system prompts to be stable (don’t change them per request), and you’ll benefit from cached pricing automatically.
How to Count Tokens
You can estimate tokens using these rules:
- English text: 1 token ≈ 4 characters ≈ 0.75 words
- Code: tends to use more tokens per line (special characters, syntax)
- Non-English languages: CJK (Chinese, Japanese, Korean) use more tokens per character
- Numbers: each digit is often its own token
For exact counts, use:
- OpenAI Tokenizer, try it free
- Our token calculator, enter your token count and see costs across all models instantly
What Is a Context Window?
The context window is the maximum number of tokens a model can handle in a single request (input + output combined).
| Model | Context Window |
|---|---|
| Llama 4 Scout | 10M tokens |
| xAI Grok 4.20 | 2M tokens |
| GPT-4.1 | 1M tokens |
| Claude Opus 4.6 | 200K tokens |
| GPT-5.4 | 270K tokens |
A larger context window means you can send more information in a single prompt, entire codebases, long documents, extended conversations. But more tokens = higher cost.
Token Pricing Tiers: What to Expect
Here’s what you’ll pay across the market in 2026:
| Tier | Input / 1M | Output / 1M | Best For |
|---|---|---|---|
| Budget | $0.10-0.30 | $0.15-0.60 | High volume, simple tasks |
| Mid-range | $0.30-3.00 | $1.00-15.00 | Production workloads |
| Premium | $5.00-15.00 | $25.00-75.00 | Complex reasoning, coding |
How to Reduce Your Token Costs
- Choose the right model, don’t use Opus for tasks Haiku can handle
- Minimize system prompts, shorter prompts = fewer input tokens
- Use caching, reuse system prompts to get cached pricing
- Batch processing, OpenAI’s Batch API gives 50% off
- Set max_tokens, limit output length to avoid paying for text you don’t need
- Summarize context, instead of sending full conversation history, summarize it
Try It Yourself
Use our free token calculator to estimate your costs across all 33 models from 7 providers. Enter your expected input and output tokens, and see exactly what you’ll pay.
Or browse the full pricing comparison to find the model that fits your budget. Ready to start building? Try OpenAI →, Try Claude →, or Try Gemini → (free tier). For a deeper breakdown of which API is right for you, see our best AI API for developers guide and cheapest AI API ranking.