Claude API Pricing Guide 2026: Models, Costs & Tips

Anthropic’s Claude API pricing in 2026 is simple on paper and easy to overspend on in production. The current active stack has three practical tiers: Claude Haiku 4.5 at $1/$5 per million tokens, Claude Sonnet 4.6 at $3/$15, and Claude Opus 4.7 at $5/$25.

That ladder is narrower than OpenAI’s model menu, but the buying decision is not automatically easier. Claude is often chosen for coding, writing, long document work, tool use, and agent reliability — workloads where output length and repeated context can quietly dominate the bill.

This guide breaks down current Claude API prices, which model to use for each workload, where hidden costs appear, and how to control spend with routing, prompt caching, and batch processing. For live tables, use our Anthropic pricing page. To compare Claude against OpenAI, Google, DeepSeek, and other providers, start with the full AI API pricing table or plug your own usage into the token cost calculator.

Claude API pricing: quick reference

All prices below are in USD per 1 million tokens.

Claude model	Status	Input	Cached input	Output	Best fit
Claude Opus 4.7	Active	$5.00	$0.50	$25.00	Premium coding, agentic workflows, complex reasoning, careful writing
Claude Sonnet 4.6	Active	$3.00	$0.30	$15.00	Default Claude production model for apps, support, RAG, code review
Claude Haiku 4.5	Active	$1.00	$0.10	$5.00	Fast classification, routing, extraction, short summaries, budget tasks
Claude Opus 4.6	Legacy	$5.00	$0.50	$25.00	Previous Opus tier, superseded by Opus 4.7 at same price
Claude Opus 4.5	Legacy	$5.00	$0.50	$25.00	Older Opus generation
Claude Sonnet 4.5	Legacy	$3.00	$0.30	$15.00	Older Sonnet generation
Claude Opus 4.1	Legacy	$15.00	$1.50	$75.00	Older premium tier; expensive versus current Opus
Claude Opus 4	Legacy	$15.00	$1.50	$75.00	Older premium tier; expensive versus current Opus
Claude Sonnet 4	Legacy	$3.00	$0.30	$15.00	Older Sonnet generation

The most important 2026 pricing change is that Opus is no longer a $15/$75 tier in the active Claude stack. Older Opus 4 and Opus 4.1 pricing was three times higher. Current Opus 4.7 sits at $5 input / $25 output, matching Opus 4.6 while delivering better coding, document, and agentic performance.

That makes Opus 4.7 a much more usable premium model than the old Opus pricing implied. It is still not cheap, but it is no longer a model you reserve only for rare executive-level prompts.

Which Claude model should you use?

Use Claude Haiku 4.5 for cheap utility calls

Claude Haiku 4.5 is the lowest-cost current Claude API model at $1.00 per million input tokens and $5.00 per million output tokens. It is the model to test first when the job is narrow, repeated, and easy to verify.

Good Haiku 4.5 workloads include:

intent routing before sending hard requests to Sonnet or Opus
short summarization
metadata extraction
language detection
customer support triage
moderation pre-checks
simple classification and tagging
transforming short snippets into structured JSON

The main cost trap with Haiku is output length. A cheap input rate does not help if every request asks the model to generate long prose. For utility jobs, keep outputs short and structured.

Use Claude Sonnet 4.6 as the default production tier

Claude Sonnet 4.6 is the practical default for many paid products. At $3.00 input / $15.00 output, it is three times the input price of Haiku and three times the output price, but it buys a large jump in writing quality, coding ability, reasoning stability, and tool-use reliability.

Use Sonnet 4.6 when:

the user sees the final response
code quality matters, but Opus is too expensive as a default
support answers need tone and judgment
RAG answers require synthesis, not just retrieval
document workflows need careful extraction and explanation
agents need to call tools reliably without escalating every request

For many apps, the winning architecture is Haiku for routing, Sonnet for most user-visible work, Opus for escalation. That keeps the default experience strong without paying premium rates on every turn.

Use Claude Opus 4.7 for premium work

Claude Opus 4.7 is Anthropic’s strongest current API model in our tracked stack. It costs $5.00 per million input tokens, $0.50 per million cached input tokens, and $25.00 per million output tokens.

Choose Opus 4.7 when quality is more valuable than raw token savings:

complex coding and debugging
multi-step agents with tool calls
long-form writing and editing where tone matters
contract, policy, or financial document analysis
high-stakes customer interactions
difficult reasoning that Sonnet fails in evaluation
final review before human sign-off

The key is to make Opus an escalation tier, not a reflex. If Sonnet passes your evals for 80% of requests, routing the remaining 20% to Opus can cut the blended bill sharply while keeping quality high where it matters.

For launch details and benchmarks, read our Claude Opus 4.7 pricing breakdown and the direct Claude Opus 4.7 vs 4.6 comparison.

Claude subscription pricing vs API pricing

Claude’s consumer plans and Claude API billing are separate buying paths.

Claude plan	Price	Best for	API replacement?
Claude Free	$0/month	Light personal use and testing	No
Claude Pro	$20/month	Individual writing, coding, document work	No
Claude Max 5x	$100/month	Heavy personal Claude use	No
Claude Max 20x	$200/month	Daily power users	No
Claude Team	$30/seat/month	Team workspace, admin, collaboration	No
Claude API	Usage-based	Products, automations, backend systems	Yes

The important point: a Claude Pro or Max subscription is not a substitute for production API billing. If you are building software, automations, or customer-facing features, budget with token rates instead of assuming a flat monthly plan covers usage.

This matters especially for developer tools and agent harnesses. Anthropic has already tightened how some third-party usage maps to Claude subscriptions, which we covered in Anthropic Stops Covering OpenClaw Usage Under Claude Pro and Max Plans. Treat subscriptions as user productivity products and API pricing as infrastructure pricing.

Hidden Claude API costs to watch

Output tokens cost more than input tokens

Claude’s output tokens cost 5x the input rate across the current active stack:

Haiku 4.5: $1 input vs $5 output
Sonnet 4.6: $3 input vs $15 output
Opus 4.7: $5 input vs $25 output

That means verbose prompts are not always the biggest problem. Long generated answers, chain-of-thought-like explanations, oversized JSON, repeated tool summaries, and unnecessary markdown can be more expensive than the input.

Set explicit response budgets. Ask for concise answers when possible. For extraction tasks, request only the fields you need.

Repeated context is expensive without caching

Claude workloads often include long system prompts, retrieved documents, style guides, code files, policies, or tool instructions. If you resend the same context repeatedly at full input price, your bill rises quickly.

Prompt caching changes that. Current Claude cached input rates are 90% cheaper than normal input rates:

Model	Normal input	Cached input	Savings
Haiku 4.5	$1.00/M	$0.10/M	90%
Sonnet 4.6	$3.00/M	$0.30/M	90%
Opus 4.7	$5.00/M	$0.50/M	90%

Caching is especially valuable for coding agents, RAG systems, internal knowledge assistants, legal review, compliance workflows, and any app with a large stable prefix.

For a deeper explanation, see Cached Tokens Explained: How to Save 50-90% on AI Costs.

Long context changes the cost shape

Claude is popular for long documents and agentic coding sessions, but long context does not mean free context. A 100K-token prompt to Sonnet 4.6 costs about $0.30 before output at standard input rates. Send that prompt 10,000 times and the input side alone becomes about $3,000.

That is still useful if the workflow saves human hours, but it should be intentional. Chunk documents, retrieve only relevant passages, cache stable material, and summarize session history when exact context is no longer needed.

For more on this tradeoff, read Understanding Context Windows and Their Cost Impact.

Batch processing can change the economics

For non-real-time workloads, Claude’s batch pricing can be materially cheaper than synchronous calls. If you are processing backlogs, nightly enrichment, offline evaluations, document batches, or test datasets, batch mode is often a better fit than paying interactive rates for every call.

The rule of thumb: use real-time API calls when a human is waiting; use batch processing when they are not.

Cost examples

Here are simple examples using active Claude API prices.

Monthly workload	Haiku 4.5	Sonnet 4.6	Opus 4.7
10M input + 2M output	$20	$60	$100
50M input + 10M output	$100	$300	$500
100M input + 20M output	$200	$600	$1,000
100M input + 100M output	$600	$1,800	$3,000

The last row shows why output control matters. When output volume matches input volume, Claude costs rise fast. For customer support, agents, coding copilots, and report generation, the generated text can be the main bill driver.

If your usage pattern is different, use the token cost calculator to model input, cached input, output, and monthly request volume.

Claude vs OpenAI and Google pricing

Claude is not usually the cheapest API stack. OpenAI has lower-cost options like GPT-4.1 nano, GPT-4.1 mini, and GPT-5.4 mini, while Google has aggressive Gemini tiers for high-volume workloads. See our OpenAI pricing guide and Google AI pricing page for the broader market.

Where Claude competes best is not the absolute floor price. It competes on:

coding and code review quality
long-form writing quality
tool-use reliability
careful instruction following
document reasoning
agent workflows where failure is expensive

That is why Claude often wins premium workflow evaluations even when a cheaper model wins the spreadsheet comparison. The right question is not “Which model has the lowest token price?” It is “Which model gives the lowest cost per successful task?”

For a direct head-to-head, read ChatGPT vs Claude Pricing 2026 and GPT-5.4 vs Claude Sonnet 4.6 Pricing.

Recommended Claude routing strategy

A cost-aware Claude setup usually looks like this:

Start with Haiku 4.5 for routing, classification, simple extraction, and short internal transformations.
Use Sonnet 4.6 as the default user-visible model when quality matters.
Escalate to Opus 4.7 for hard coding, complex reasoning, long document synthesis, and final review.
Cache stable context such as system prompts, coding rules, policy documents, and retrieved knowledge bases.
Batch offline jobs instead of running every task through real-time endpoints.
Measure cost per accepted answer, not cost per token alone.

If you are new to Claude API pricing, start with Sonnet 4.6 and run evals against Haiku and Opus. Haiku will tell you how low you can push cost. Opus will tell you how much quality you are leaving on the table. Sonnet is usually the middle that becomes production default.

FAQ

What is the cheapest Claude API model in 2026?

Claude Haiku 4.5 is the cheapest current Claude API model in our tracker at $1.00 per million input tokens and $5.00 per million output tokens. It is best for routing, classification, extraction, and short summaries.

Is Claude Opus 4.7 more expensive than Sonnet 4.6?

Yes. Opus 4.7 costs $5.00 per million input tokens and $25.00 per million output tokens, while Sonnet 4.6 costs $3.00 input and $15.00 output. Opus is about 67% more expensive than Sonnet on both input and output.

Did Opus 4.7 increase prices over Opus 4.6?

No. Opus 4.7 uses the same listed token pricing as Opus 4.6: $5.00 input, $0.50 cached input, and $25.00 output per million tokens. The upgrade is mainly a capability improvement at the same price.

Can I use Claude Pro instead of paying API prices?

Not for production software. Claude Pro, Max, and Team are subscription products for human use. Apps, automations, backend workflows, and customer-facing products should budget against Claude API token pricing.

How do I reduce Claude API costs?

Use Haiku for cheap utility calls, Sonnet as the default production tier, and Opus only for escalation. Keep outputs short, cache repeated context, batch offline work, and measure cost per successful task rather than just raw model price.