Cheapest AI API in May 2026: Cohere R7B at $0.0375/1M Beats 9 Rivals

Provider pricing pages are not written in the same shape. I keep this guide focused on the numbers a buyer has to normalize before comparing OpenAI, Anthropic, Google, DeepSeek, and the rest side by side.

Looking for the cheapest AI API providers in May 2026? We’ve ranked every major model by real per-token cost so you can find the best deal for your workload, whether you’re optimizing for input-heavy classification, output-heavy generation, or long-context document tasks. The cheapest model right now is Cohere Command R7B at $0.0375 per 1M input tokens, about 1/65th the price of GPT-5.4 and 1/130th the price of Claude Opus 4.7.

All prices below are current as of May 2026 and auto-updated daily from provider pricing pages on our pricing comparison page.

Cheapest AI API Providers: Input Price Ranking (May 2026)

Rank	Model	Provider	Input / 1M	Output / 1M
1	Command R7B	Cohere	$0.0375	$0.15
2	Ministral 3B	Mistral	$0.04	$0.04
3	Llama 3.1 8B Instant	Groq	$0.05	$0.08
4	GPT-OSS 20B	Together	$0.05	$0.20
5	GPT-OSS 20B	Groq	$0.075	$0.30
6	Llama 4 Scout	Meta	$0.08	$0.30
7	GPT-4.1 nano	OpenAI	$0.10	$0.40
8	Mistral Small	Mistral	$0.10	$0.30
9	Ministral 8B	Mistral	$0.10	$0.10
10	Grok 4.1 Fast	xAI	$0.20	$0.50

The Ultra-Budget Tier (Under $0.10/1M Input)

1. Cohere Command R7B, $0.0375 input / $0.15 output

The absolute cheapest general-purpose AI API in 2026. Cohere’s 7B parameter model handles classification, routing, and retrieval-augmented generation at less than 4¢ per million input tokens. Excellent for high-volume enterprise RAG workloads where cost dominates. Try Cohere →

2. Mistral Ministral 3B, $0.04 input / $0.04 output

Mistral’s edge-class model has identical input and output pricing, unusual and very useful for output-heavy workloads like content generation. 3B parameters is small, but strong enough for summarization, simple extraction, and on-device-style tasks. Try Mistral →

3. Groq Llama 3.1 8B Instant, $0.05 input / $0.08 output

Groq’s LPU-accelerated Llama 3.1 8B is the speed-per-dollar champion, blazing fast inference at nickel-per-million-tokens pricing. Best choice when both latency and cost matter. Try Groq →

4. OpenAI GPT-4.1 nano, $0.10 input / $0.40 output

OpenAI’s cheapest production model with a 1M+ token context window. Great for classification, routing, and simple extraction where you need OpenAI’s ecosystem (function calling, structured outputs, wide SDK support). Try OpenAI →

Best Value: Price vs. Performance

Raw cost isn’t everything. Here’s our value ranking considering capability:

Best Overall Value: Gemini 3.1 Flash-Lite ($0.25 / $1.50)

Google’s Flash-Lite model is fast, supports text + image + video + audio input, and costs a fraction of premium multimodal competitors. For most applications needing modalities beyond text, this is the sweet spot.

Best for Coding: Grok 4.1 Fast ($0.20 / $0.50)

xAI’s fast model delivers strong coding performance at a fraction of Claude’s price. With a 2M-token context window, it handles large codebases easily.

Best for Reasoning: DeepSeek V4 Pro ($1.74 / $3.48)

DeepSeek’s flagship reasoning model delivers thinking-mode capabilities at roughly 1/2 to 1/4 the cost of OpenAI’s o3 or Anthropic’s Opus, with a 1M-token context window.

Best for Long Context: Llama 4 Scout ($0.08 / $0.30)

10 million token context window at $0.08 input / $0.30 output per 1M. Nothing else comes close for context length vs. price, Meta cut the input rate from $0.15 to $0.08 since April.

Cheapest by Use Case

Use Case	Best Model	Monthly Cost (10M tokens)
Classification/routing	Cohere Command R7B	$1.88
Chatbots	Mistral Ministral 3B	$0.40
Fast low-latency serving	Groq Llama 3.1 8B	$0.65
Code generation	Grok 4.1 Fast	$7.00
Document analysis	Llama 4 Scout	$1.90
Complex reasoning	DeepSeek V4 Pro	$52.20
Multimodal	Gemini 3.1 Flash-Lite	$17.50

How to Save Even More

Use cached input pricing, most providers offer 80-90% discounts on repeated prompts (OpenAI, Anthropic, DeepSeek, Google all support it)
Batch API, OpenAI offers 50% off for async processing with 24h turnaround
Right-size your model, don’t use GPT-5.4 for tasks that Cohere R7B or Ministral 3B can handle
Monitor your usage, use our token calculator to estimate costs before committing

The Expensive Tier (For Reference)

Not everything is about saving money. Here are the premium models and what you pay for:

Model	Input / 1M	Output / 1M	Why Pay More?
Claude Opus 4.7	$5.00	$25.00	Best agentic coding & reasoning
GPT-5.4	$2.50	$15.00	Most capable general-purpose
Gemini 3.1 Pro	$2.00	$12.00	Latest Google, multimodal
Claude Sonnet 4.6	$3.00	$15.00	Balanced quality + price

These models are worth it for complex tasks where quality matters more than cost. But for 80% of production workloads, the budget tier delivers good-enough results at 50-100× lower cost.

Compare all models on our pricing comparison page, or calculate your specific costs with our token calculator. For deeper breakdowns see our full pricing comparison of 13 providers or our best AI models ranking.

If you need bulk content generation but don’t want to pay per token, Writesonic offers AI writing starting at $13/month with a free trial, a flat-rate alternative to metering raw API costs.

FAQ

What is the cheapest AI API in 2026?

Cohere’s Command R7B is the cheapest AI API in 2026 at $0.0375 per 1M input tokens and $0.15 per 1M output tokens. For balanced input/output costs, Mistral Ministral 3B at $0.04 / $0.04 per 1M tokens is the cheapest symmetric option, which makes it the best pick for output-heavy workloads like content generation.

How much does the cheapest AI API cost per million tokens?

The cheapest AI API providers in 2026 charge between $0.0375 and $0.10 per million input tokens, and between $0.04 and $0.60 per million output tokens. At those prices, processing 10 million mixed input + output tokens costs roughly $0.40 to $3.00 in total, under the cost of a single coffee.

Which AI API has the lowest per-token price?

Cohere Command R7B has the lowest per-token input price at $0.0375 per 1M tokens. Mistral Ministral 3B is the cheapest by combined input + output cost at $0.08 per 1M tokens. Groq’s Llama 3.1 8B Instant is the cheapest fast-serving option at $0.05 / $0.08 per 1M tokens, useful when latency and cost both matter.

Is DeepSeek the cheapest AI API?

DeepSeek is one of the cheapest providers for reasoning-grade models, but it isn’t the absolute cheapest API overall. DeepSeek V4 Pro starts at $1.74 per 1M input tokens, far cheaper than GPT-5.4 ($2.50) or Claude Sonnet 4.6 ($3.00), but still around 45× more expensive than Cohere Command R7B. DeepSeek’s value is in price-to-capability for harder reasoning tasks, not raw cheapness.

How can I cut my AI API costs further?

Four levers stack: (1) use cached input pricing for 80–90% off repeated prompts (OpenAI, Anthropic, DeepSeek and Google all support it), (2) use the Batch API for 50% off async workloads with 24h turnaround, (3) right-size your model, don’t use GPT-5.4 for tasks Cohere R7B or Ministral 3B can handle, and (4) monitor usage with our token calculator before committing to a provider.

Are these AI API prices updated automatically?

Yes. AI Pricing Guru’s autonomous pricing pipeline checks all major providers every few hours and only publishes price changes when at least two independent sources agree within 1%. The rankings and per-token figures on this page are auto-refreshed against our live pricing comparison data, so the “cheapest AI API” list always reflects today’s prices, not last quarter’s.