guide

What Is a Token in AI? How Token Pricing Works (Plain English Guide)

Learn what AI tokens are, how they're counted, and why they matter for pricing. A simple guide for anyone using AI APIs like ChatGPT, Claude, or Gemini.

By AI Pricing Guru Editorial Team

AI Pricing Guru articles are maintained by the editorial workflow behind the site: daily pricing snapshots, provider source checks, and review passes for model launches, subscription limits, and billing changes.

Token pricing looks abstract until you map it to a real workflow. I use this page to keep the math visible: input, output, cached input, and the places where a small model can do the boring part first.

If you’re using AI APIs, or even just reading about AI pricing, you’ll see everything measured in “tokens.” But what actually is a token, and why should you care?

What Is a Token?

A token is the smallest unit of text that an AI model processes. Think of it as a piece of a word.

Examples:

  • “Hello” = 1 token
  • “artificial intelligence” = 2 tokens
  • “I love programming” = 3 tokens
  • “Pneumonoultramicroscopicsilicovolcanoconiosis” = 11 tokens (long words get split)

The rough rule: 1 token ≈ 4 characters of English text, or about ¾ of a word.

So 1,000 tokens ≈ 750 words, roughly one page of text.

Why Tokens Matter for Pricing

AI providers charge per token, not per word or per request. Every API call has:

  1. Input tokens, what you send (your prompt, system instructions, conversation history)
  2. Output tokens, what the model generates (the response)

These are priced separately, and output tokens almost always cost more than input tokens.

Example: Sending a Message to GPT-5.4

Let’s say you send a 500-word prompt and get a 200-word response:

  • Input: ~667 tokens × $2.50/1M = $0.0017
  • Output: ~267 tokens × $15.00/1M = $0.0040
  • Total: $0.0057 (less than a cent)

Sounds cheap, but at scale it adds up fast. A chatbot handling 100,000 conversations per day could cost thousands of dollars monthly.

Input vs. Output: Why Output Costs More

Output tokens are 3-5x more expensive than input tokens across all providers. Why?

When a model generates output, it’s doing much more computational work. Each new token requires processing all previous tokens. Input tokens are processed in parallel; output tokens are generated one at a time.

ProviderInput / 1MOutput / 1MOutput Premium
OpenAI GPT-5.4$2.50$15.006x
Anthropic Claude Opus 4.6$5.00$25.005x
Google Gemini 2.5 Pro$1.25$10.008x
DeepSeek V3.2$0.28$0.421.5x

Notice how DeepSeek bucks the trend with only a 1.5x premium. This is one reason it’s so popular for high-output workloads.

What Are Cached Input Tokens?

When you send the same system prompt or context repeatedly (like in a chatbot), providers can cache that input and charge you less:

ProviderRegular InputCached InputSavings
OpenAI$2.50$0.2590%
Anthropic$5.00$0.5090%
Google$1.25$0.1390%
DeepSeek$0.28$0.02890%

Pro tip: Design your system prompts to be stable (don’t change them per request), and you’ll benefit from cached pricing automatically.

How to Count Tokens

You can estimate tokens using these rules:

  • English text: 1 token ≈ 4 characters ≈ 0.75 words
  • Code: tends to use more tokens per line (special characters, syntax)
  • Non-English languages: CJK (Chinese, Japanese, Korean) use more tokens per character
  • Numbers: each digit is often its own token

For exact counts, use:

What Is a Context Window?

The context window is the maximum number of tokens a model can handle in a single request (input + output combined).

ModelContext Window
Llama 4 Scout10M tokens
xAI Grok 4.202M tokens
GPT-4.11M tokens
Claude Opus 4.6200K tokens
GPT-5.4270K tokens

A larger context window means you can send more information in a single prompt, entire codebases, long documents, extended conversations. But more tokens = higher cost.

Token Pricing Tiers: What to Expect

Here’s what you’ll pay across the market in 2026:

TierInput / 1MOutput / 1MBest For
Budget$0.10-0.30$0.15-0.60High volume, simple tasks
Mid-range$0.30-3.00$1.00-15.00Production workloads
Premium$5.00-15.00$25.00-75.00Complex reasoning, coding

How to Reduce Your Token Costs

  1. Choose the right model, don’t use Opus for tasks Haiku can handle
  2. Minimize system prompts, shorter prompts = fewer input tokens
  3. Use caching, reuse system prompts to get cached pricing
  4. Batch processing, OpenAI’s Batch API gives 50% off
  5. Set max_tokens, limit output length to avoid paying for text you don’t need
  6. Summarize context, instead of sending full conversation history, summarize it

Try It Yourself

Use our free token calculator to estimate your costs across all 33 models from 7 providers. Enter your expected input and output tokens, and see exactly what you’ll pay.

Or browse the full pricing comparison to find the model that fits your budget. Ready to start building? Try OpenAI →, Try Claude →, or Try Gemini → (free tier). For a deeper breakdown of which API is right for you, see our best AI API for developers guide and cheapest AI API ranking.