guide

What Is a Token in AI? How Token Pricing Works (Plain English Guide)

Learn what AI tokens are, how they're counted, and why they matter for pricing. A simple guide for anyone using AI APIs like ChatGPT, Claude, or Gemini.

If you’re using AI APIs — or even just reading about AI pricing — you’ll see everything measured in “tokens.” But what actually is a token, and why should you care?

What Is a Token?

A token is the smallest unit of text that an AI model processes. Think of it as a piece of a word.

Examples:

  • “Hello” = 1 token
  • “artificial intelligence” = 2 tokens
  • “I love programming” = 3 tokens
  • “Pneumonoultramicroscopicsilicovolcanoconiosis” = 11 tokens (long words get split)

The rough rule: 1 token ≈ 4 characters of English text, or about ¾ of a word.

So 1,000 tokens ≈ 750 words — roughly one page of text.

Why Tokens Matter for Pricing

AI providers charge per token, not per word or per request. Every API call has:

  1. Input tokens — what you send (your prompt, system instructions, conversation history)
  2. Output tokens — what the model generates (the response)

These are priced separately, and output tokens almost always cost more than input tokens.

Example: Sending a Message to GPT-5.4

Let’s say you send a 500-word prompt and get a 200-word response:

  • Input: ~667 tokens × $2.50/1M = $0.0017
  • Output: ~267 tokens × $15.00/1M = $0.0040
  • Total: $0.0057 (less than a cent)

Sounds cheap — but at scale it adds up fast. A chatbot handling 100,000 conversations per day could cost thousands of dollars monthly.

Input vs. Output: Why Output Costs More

Output tokens are 3-5x more expensive than input tokens across all providers. Why?

When a model generates output, it’s doing much more computational work. Each new token requires processing all previous tokens. Input tokens are processed in parallel; output tokens are generated one at a time.

ProviderInput / 1MOutput / 1MOutput Premium
OpenAI GPT-5.4$2.50$15.006x
Anthropic Claude Opus 4.6$5.00$25.005x
Google Gemini 2.5 Pro$1.25$10.008x
DeepSeek V3.2$0.28$0.421.5x

Notice how DeepSeek bucks the trend with only a 1.5x premium. This is one reason it’s so popular for high-output workloads.

What Are Cached Input Tokens?

When you send the same system prompt or context repeatedly (like in a chatbot), providers can cache that input and charge you less:

ProviderRegular InputCached InputSavings
OpenAI$2.50$0.2590%
Anthropic$5.00$0.5090%
Google$1.25$0.1390%
DeepSeek$0.28$0.02890%

Pro tip: Design your system prompts to be stable (don’t change them per request), and you’ll benefit from cached pricing automatically.

How to Count Tokens

You can estimate tokens using these rules:

  • English text: 1 token ≈ 4 characters ≈ 0.75 words
  • Code: tends to use more tokens per line (special characters, syntax)
  • Non-English languages: CJK (Chinese, Japanese, Korean) use more tokens per character
  • Numbers: each digit is often its own token

For exact counts, use:

What Is a Context Window?

The context window is the maximum number of tokens a model can handle in a single request (input + output combined).

ModelContext Window
Llama 4 Scout10M tokens
xAI Grok 4.202M tokens
GPT-4.11M tokens
Claude Opus 4.61M tokens
GPT-5.4270K tokens

A larger context window means you can send more information in a single prompt — entire codebases, long documents, extended conversations. But more tokens = higher cost.

Token Pricing Tiers: What to Expect

Here’s what you’ll pay across the market in 2026:

TierInput / 1MOutput / 1MBest For
Budget$0.10-0.30$0.15-0.60High volume, simple tasks
Mid-range$0.30-3.00$1.00-15.00Production workloads
Premium$5.00-15.00$25.00-75.00Complex reasoning, coding

How to Reduce Your Token Costs

  1. Choose the right model — don’t use Opus for tasks Haiku can handle
  2. Minimize system prompts — shorter prompts = fewer input tokens
  3. Use caching — reuse system prompts to get cached pricing
  4. Batch processing — OpenAI’s Batch API gives 50% off
  5. Set max_tokens — limit output length to avoid paying for text you don’t need
  6. Summarize context — instead of sending full conversation history, summarize it

Try It Yourself

Use our free token calculator to estimate your costs across all 33 models from 7 providers. Enter your expected input and output tokens, and see exactly what you’ll pay.

Or browse the full pricing comparison to find the model that fits your budget.