What Is a Token in AI? How Token Pricing Works (Plain English Guide)
Learn what AI tokens are, how they're counted, and why they matter for pricing. A simple guide for anyone using AI APIs like ChatGPT, Claude, or Gemini.
If you’re using AI APIs — or even just reading about AI pricing — you’ll see everything measured in “tokens.” But what actually is a token, and why should you care?
What Is a Token?
A token is the smallest unit of text that an AI model processes. Think of it as a piece of a word.
Examples:
- “Hello” = 1 token
- “artificial intelligence” = 2 tokens
- “I love programming” = 3 tokens
- “Pneumonoultramicroscopicsilicovolcanoconiosis” = 11 tokens (long words get split)
The rough rule: 1 token ≈ 4 characters of English text, or about ¾ of a word.
So 1,000 tokens ≈ 750 words — roughly one page of text.
Why Tokens Matter for Pricing
AI providers charge per token, not per word or per request. Every API call has:
- Input tokens — what you send (your prompt, system instructions, conversation history)
- Output tokens — what the model generates (the response)
These are priced separately, and output tokens almost always cost more than input tokens.
Example: Sending a Message to GPT-5.4
Let’s say you send a 500-word prompt and get a 200-word response:
- Input: ~667 tokens × $2.50/1M = $0.0017
- Output: ~267 tokens × $15.00/1M = $0.0040
- Total: $0.0057 (less than a cent)
Sounds cheap — but at scale it adds up fast. A chatbot handling 100,000 conversations per day could cost thousands of dollars monthly.
Input vs. Output: Why Output Costs More
Output tokens are 3-5x more expensive than input tokens across all providers. Why?
When a model generates output, it’s doing much more computational work. Each new token requires processing all previous tokens. Input tokens are processed in parallel; output tokens are generated one at a time.
| Provider | Input / 1M | Output / 1M | Output Premium |
|---|---|---|---|
| OpenAI GPT-5.4 | $2.50 | $15.00 | 6x |
| Anthropic Claude Opus 4.6 | $5.00 | $25.00 | 5x |
| Google Gemini 2.5 Pro | $1.25 | $10.00 | 8x |
| DeepSeek V3.2 | $0.28 | $0.42 | 1.5x |
Notice how DeepSeek bucks the trend with only a 1.5x premium. This is one reason it’s so popular for high-output workloads.
What Are Cached Input Tokens?
When you send the same system prompt or context repeatedly (like in a chatbot), providers can cache that input and charge you less:
| Provider | Regular Input | Cached Input | Savings |
|---|---|---|---|
| OpenAI | $2.50 | $0.25 | 90% |
| Anthropic | $5.00 | $0.50 | 90% |
| $1.25 | $0.13 | 90% | |
| DeepSeek | $0.28 | $0.028 | 90% |
Pro tip: Design your system prompts to be stable (don’t change them per request), and you’ll benefit from cached pricing automatically.
How to Count Tokens
You can estimate tokens using these rules:
- English text: 1 token ≈ 4 characters ≈ 0.75 words
- Code: tends to use more tokens per line (special characters, syntax)
- Non-English languages: CJK (Chinese, Japanese, Korean) use more tokens per character
- Numbers: each digit is often its own token
For exact counts, use:
- OpenAI Tokenizer — try it free
- Our token calculator — enter your token count and see costs across all models instantly
What Is a Context Window?
The context window is the maximum number of tokens a model can handle in a single request (input + output combined).
| Model | Context Window |
|---|---|
| Llama 4 Scout | 10M tokens |
| xAI Grok 4.20 | 2M tokens |
| GPT-4.1 | 1M tokens |
| Claude Opus 4.6 | 1M tokens |
| GPT-5.4 | 270K tokens |
A larger context window means you can send more information in a single prompt — entire codebases, long documents, extended conversations. But more tokens = higher cost.
Token Pricing Tiers: What to Expect
Here’s what you’ll pay across the market in 2026:
| Tier | Input / 1M | Output / 1M | Best For |
|---|---|---|---|
| Budget | $0.10-0.30 | $0.15-0.60 | High volume, simple tasks |
| Mid-range | $0.30-3.00 | $1.00-15.00 | Production workloads |
| Premium | $5.00-15.00 | $25.00-75.00 | Complex reasoning, coding |
How to Reduce Your Token Costs
- Choose the right model — don’t use Opus for tasks Haiku can handle
- Minimize system prompts — shorter prompts = fewer input tokens
- Use caching — reuse system prompts to get cached pricing
- Batch processing — OpenAI’s Batch API gives 50% off
- Set max_tokens — limit output length to avoid paying for text you don’t need
- Summarize context — instead of sending full conversation history, summarize it
Try It Yourself
Use our free token calculator to estimate your costs across all 33 models from 7 providers. Enter your expected input and output tokens, and see exactly what you’ll pay.
Or browse the full pricing comparison to find the model that fits your budget.