AI API Pricing Comparison 2026: 73 Models Ranked

I wrote this with the pricing table open, not as a generic AI-tools list. The useful question is simple: where does this choice change the bill, the cap, or the model-routing decision?

We track token pricing across every major AI API provider, updated daily. Here’s the complete comparison for May 2026, covering 73 active models from 11 providers (89 total including legacy releases).

2026 Price Trends: AI APIs Are 60-80% Cheaper Than 2025

The average cost per million tokens has dropped 60-80% since early 2025. Key milestones:

DeepSeek unified pricing at $0.28/M input, 90% below OpenAI
Google launched Gemini 3.1 Flash-Lite at $0.25/M, competing with open-source hosts
OpenAI released GPT-4.1 nano at $0.10/M, their cheapest model ever
Anthropic cut Opus pricing from $15/M (Opus 4.1) to $5/M in the current Opus 4.x line, a 67% reduction

Complete Pricing Table: Cheapest to Most Expensive

Budget Tier (Under $0.50/M Input)

Model	Provider	Input/1M	Output/1M
DeepSeek V3.2 (cached)	DeepSeek	$0.028	$0.42
GPT-4.1 nano	OpenAI	$0.10	$0.40
Gemini 3.1 Flash-Lite	Google	$0.25	$1.50
Gemini 2.5 Flash	Google	$0.30	$2.50
Mistral Small	Mistral	$0.10	$0.30
Groq Llama 4 Scout	Groq	$0.11	$0.34
GPT-4o mini	OpenAI	$0.15	$0.60
Llama 4 Scout	Meta	$0.15	$0.15
Jamba 2 Mini	AI21	$0.20	$0.40
Llama 4 Maverick	Meta	$0.20	$0.20
GPT-5.4 nano	OpenAI	$0.20	$1.25
Groq Llama 4 Maverick	Groq	$0.20	$0.60
Grok 4.1 Fast	xAI	$0.20	$0.50
Fireworks Llama 4 Maverick	Fireworks	$0.22	$0.88
Fireworks DeepSeek V3	Fireworks	$0.22	$0.88
Claude Haiku 3	Anthropic	$0.25	$1.25
Gemini 3.1 Flash-Lite	Google	$0.25	$1.50
DeepSeek V3.2 Chat	DeepSeek	$0.28	$0.42
Together Llama 4 Maverick	Together	$0.30	$0.90
Together DeepSeek V3	Together	$0.30	$0.90
Gemini 2.5 Flash	Google	$0.30	$2.50
Codestral	Mistral	$0.30	$0.90
GPT-4.1 mini	OpenAI	$0.40	$1.60

This is the sweet spot for high-volume applications. You can process millions of tokens for under $1.

Mid Tier ($0.50 - $3.00/M Input)

Model	Provider	Input/1M	Output/1M
Gemini 3 Flash	Google	$0.50	$3.00
Command R	Cohere	$0.50	$1.50
GPT-5.4 mini	OpenAI	$0.75	$4.50
Groq DeepSeek R1	Groq	$0.75	$0.99
Claude Haiku 3.5	Anthropic	$0.80	$4.00
Claude Haiku 4.5	Anthropic	$1.00	$5.00
Sonar	Perplexity	$1.00	$1.00
o4-mini	OpenAI	$1.10	$4.40
Gemini 2.5 Pro	Google	$1.25	$10.00
GPT-4.1	OpenAI	$2.00	$8.00
o3	OpenAI	$2.00	$8.00
Gemini 3.1 Pro	Google	$2.00	$12.00
Grok 4.20	xAI	$2.00	$6.00
Jamba 2 Large	AI21	$2.00	$8.00
Mistral Large	Mistral	$2.00	$6.00
GPT-5.4	OpenAI	$2.50	$15.00
GPT-4o	OpenAI	$2.50	$10.00
Command R+	Cohere	$2.50	$10.00
Command A	Cohere	$2.50	$10.00
Claude Sonnet 5	Anthropic	$2.00	$10.00
Sonar Pro	Perplexity	$2.00	$10.00

The mid tier is where most production apps live. GPT-5.4 mini and Gemini 2.5 Pro offer the best value here.

Premium Tier ($5.00+/M Input)

Model	Provider	Input/1M	Output/1M
Claude Opus 4.8	Anthropic	$5.00	$25.00
GPT-4 Turbo	OpenAI	$10.00	$30.00
o1	OpenAI	$15.00	$60.00
Claude Opus 4.1 (legacy)	Anthropic	$15.00	$75.00

The premium tier is shrinking. Most developers have migrated to cheaper, more capable newer models. There’s rarely a reason to use GPT-4 Turbo or Claude Opus 4.1 in 2026.

Provider Breakdown

OpenAI, Most Models, Best Coverage

13 models covering every price point. The GPT-5.4 and GPT-4.1 families give you granular control over the price-quality trade-off. Full OpenAI pricing → · Try OpenAI →

Anthropic, Premium Quality, Higher Prices

10 models. Claude Sonnet 5 is the coding champion, and prompt caching (90% savings) makes Anthropic competitive for repetitive workloads. Full Anthropic pricing → · Try Claude →

Google, Best Free Tier

8 models with generous free tiers. Gemini 2.5 Pro at $1.25/M is the cheapest premium model from a Tier 1 provider. Full Google pricing → · Try Gemini →

DeepSeek, Budget King

2 models that punch way above their price point. Automatic caching makes it even cheaper. Full DeepSeek pricing → · Try DeepSeek →

Inference Hosts (Groq, Together, Fireworks)

Run open-source models (Llama 4, DeepSeek) on optimized infrastructure. Groq offers the fastest inference at $0.05–$0.90/M input tokens; Together and Fireworks compete on price. Full Groq pricing → · Together AI pricing → · Try Groq →

Mistral AI, European Alternative

Seven models including Mistral Large 2 ($2/$6 per 1M) and the Ministral edge family ($0.04/$0.04 per 1M for 3B). Strong multilingual performance, GDPR-friendly hosting. Full Mistral pricing → · Try Mistral →

xAI (Grok), Real-Time-Web Reasoning

Grok 4.20 at $2/$6 per 1M and Grok 4.1 Fast at $0.20/$0.50 per 1M. Tight X/Twitter integration for real-time search. Full xAI Grok pricing → · Try Grok →

Cohere, Enterprise RAG Stack

Command R+ at $2.50/$10.00 per 1M plus Embed v3 and Rerank v3, a full RAG toolkit from a single vendor. Command R7B ships at $0.0375/$0.15 per 1M. Full Cohere pricing →

Meta Llama, Host-It-Anywhere Open Weights

Meta doesn’t sell an API, so Llama pricing is a cross-provider comparison. Llama 3.3 70B varies from $0.23/M (Deepinfra) to $0.90/M (Fireworks). Side-by-side comparison →

Perplexity, Web-Search-Native Sonar

Sonar Small at $0.20/M, Sonar Pro at $2/$10 through August 31 per 1M. Every response comes with citations from live web search. Full Perplexity pricing →

The Bottom Line

The AI API market in 2026 is a buyer’s paradise. Prices have plummeted, quality has surged, and you have more options than ever. The winners:

DeepSeek for raw cost efficiency
Google Gemini 2.5 Pro for best value flagship
OpenAI GPT-5.4 mini for best all-around production model
Claude Sonnet 5 for coding
Groq for fastest inference

Use our pricing comparison table to explore every tracked model, or try the token calculator to estimate costs for your specific workload. Need the fastest open-source inference? See Groq pricing. For deeper dives see our cheapest AI API ranking and best AI models of 2026 guide.

Adding generated speech, dubbing, or voice agents to your product? Compare API model costs above, then test ElevenLabs when voice quality matters more than the cheapest per-token text model.

Affiliate disclosure: this link may earn us a commission at no extra cost to you. It does not affect the pricing comparison above.

FAQ

How does AI API pricing compare across OpenAI, Claude, Gemini, DeepSeek, and Llama in 2026?

For input tokens, DeepSeek V4 leads at $0.28 per 1M, followed by Gemini 2.5 Flash at $0.30 per 1M. OpenAI GPT-5.4 mini sits at $0.40 per 1M, Claude Sonnet 5 is $2 per 1M during launch pricing, and Llama 3.3 70B starts at $0.23 per 1M on Deepinfra. Output tokens stretch the gap further: Opus 4.8 charges $25 per 1M output while DeepSeek and Gemini Flash stay under $2.50 per 1M.

Where is the most accurate AI API pricing comparison table for 2026?

Our pricing comparison table tracks 55 models across 13 providers with input, output, and cached-input prices, all sourced from each provider’s official pricing page and reconciled against at least two independent sources before publishing. The table updates automatically within four hours of any provider change.

How often do AI API prices change in 2026?

Most providers update prices once per quarter, but flagship launches ship with new pricing more often. Anthropic, OpenAI, and Google have each cut at least one model price in 2026 already. We track daily and log every change to a pricing-history file so you can audit when each model’s input or output rate moved.

What is the cheapest AI API per token in 2026?

Cohere Command R7B is the cheapest first-party API at $0.0375 input and $0.15 output per 1M tokens. For Llama hosting, Deepinfra serves Llama 3.3 70B at $0.23 per 1M. Mistral Ministral 3B and Groq Llama 3.1 8B Instant round out the budget tier under $0.10 per 1M input. See our cheapest AI API ranking for the full top-10 ranked by per-token cost.

Subscriptions make sense if you spend less than four hours per day in chat-style usage and don’t need automation. Once you script anything, the pay-per-token API is almost always cheaper because flat-rate plans cap usage at hidden message limits. Our subscription vs API calculator models the breakeven for your specific usage pattern.

Are these AI API price comparison numbers updated automatically?

Yes. The data on this page is rendered from our public pricing.json feed, which is regenerated every four hours by an autonomous pipeline that scrapes each provider’s pricing page, reconciles the readings against independent sources, and only publishes changes that two sources agree on within 1%.

Data updated daily. Last update: May 7, 2026.