Groq API Pricing (April 2026) — Per-Token Cost Comparison
Last updated:
How much does the Groq API cost? Groq charges between $0.05 and $0.90 per million input tokens, with most models under $1.00 per 1M tokens. The flagship Llama 3.3 70B Versatile is $0.59 input / $0.79 output per 1M tokens. Groq runs open-source models (Llama, Mixtral, Gemma, DeepSeek R1 Distill, Qwen) on custom LPU chips that deliver 500+ tokens per second — the fastest inference available as of April 2026.
Key facts about Groq pricing
- Groq runs on LPU (Language Processing Unit) architecture — purpose-built silicon for LLM inference, not repurposed GPUs.
- Typical inference speed is 500+ tokens per second on Llama 3.1 8B — roughly 10x faster than comparable GPU hosts.
- Llama 3.1 8B Instant costs just $0.05 per million input tokens, making it one of the cheapest production APIs available.
- Llama 3.3 70B Versatile at $0.59 / $0.79 per 1M tokens undercuts GPT-4o mini on output cost while offering flagship-class capability.
- DeepSeek R1 Distill Llama 70B at $0.75 / $0.99 per 1M tokens gives you reasoning-model output at low-latency Groq speeds.
- Groq hosts only open-weight models — no proprietary GPT, Claude, or Gemini.
- Free developer tier available; paid tiers scale to production rate limits without monthly minimums.
- No prompt-caching discount (unlike Anthropic and DeepSeek), but raw per-token prices are already low enough that caching matters less.
How much does each Groq model cost per million tokens?
| Model↕ | Provider↕ | Input $/1M↑ | Cached $/1M↕ | Output $/1M↕ |
|---|
Last synced:
Why choose Groq over other inference providers?
Groq's value proposition comes down to three things: speed, price, and developer experience. On speed, the custom LPU silicon delivers 500+ tokens per second on Llama 3.1 8B and 250+ tokens per second on Llama 3.3 70B. That's roughly 10x faster than Together AI or Fireworks running the same models on H100 GPUs. For voice agents, real-time search, and interactive tooling, this latency difference is the difference between a product that feels alive and one that feels sluggish.
On price, Groq is consistently among the cheapest hosts for open-source models. Llama 3.1 8B at $0.05/M input is 3x cheaper than GPT-4o mini ($0.15/M). Llama 3.3 70B at $0.59/M input beats most alternatives for frontier-class open models. The only providers that undercut Groq on headline price are DeepSeek (for their own models only) and Fireworks (on some SKUs).
On developer experience, Groq ships an OpenAI-compatible SDK so migration from OpenAI is usually a matter of changing a base URL. Free tier limits are generous enough to build a working prototype without a credit card. Rate limits scale with usage.
When Groq is not the right choice
Groq doesn't host GPT-5, Claude, or Gemini — if your app depends on those specific models, you need to stay with the original provider or use a router like OpenRouter. Groq also doesn't offer long-term fine-tuning or custom model hosting the way Together AI does. And if your workload benefits heavily from prompt caching (e.g., long system prompts), Anthropic's 90% cache discount or DeepSeek's automatic caching may beat Groq's raw per-token price.
Price History
Track how Groq API pricing has changed over time.
Price history tracking started April 2026. Charts will appear after the first price change is detected.
View pricing changelog →
Frequently asked questions
How much does the Groq API cost?
Groq pricing starts at $0.05 per million input tokens for Llama 3.1 8B Instant and goes up to $0.90 per million tokens for Llama 3.2 90B Vision. The flagship Llama 3.3 70B Versatile model costs $0.59 per million input tokens and $0.79 per million output tokens. All prices are per 1M tokens.
Is Groq cheaper than OpenAI?
Yes, for comparable open-source models Groq is dramatically cheaper than OpenAI. Llama 3.3 70B on Groq costs $0.59 per 1M input tokens versus GPT-5.4 at $2.50 per 1M input tokens — about 4x cheaper. However, Groq does not host GPT models and the model quality tiers differ.
What makes Groq inference so fast?
Groq runs on its custom LPU (Language Processing Unit) architecture — purpose-built silicon for LLM inference. Typical Groq throughput exceeds 500 tokens per second on Llama 3.1 8B, 10x faster than GPU-based inference. This matters for latency-sensitive applications like voice agents and real-time tools.
Does Groq offer a free tier?
Yes, Groq provides a free developer tier with generous daily token limits for prototyping. Paid tiers unlock higher rate limits and production SLAs. Pricing is pay-as-you-go per token, with no monthly minimums.
Which Groq model is cheapest?
Llama 3.1 8B Instant is the cheapest at $0.05 per million input tokens and $0.08 per million output tokens. For vision tasks, Llama 3.2 11B Vision at $0.18 per million tokens is the most affordable multimodal option.
Can I use DeepSeek models on Groq?
Yes. Groq hosts DeepSeek R1 Distill Llama 70B at $0.75 per million input tokens and $0.99 per million output tokens. This gives you DeepSeek-style reasoning at Groq inference speeds — a good fit for chain-of-thought workloads that also need low latency.
Methodology
Pricing sourced from https://groq.com/pricing/ on . All prices expressed in USD per 1 million tokens. We track pricing across 0 Groq-hosted models; this page shows the per-model input and output rates.
Compare Groq to other providers
Further reading: Best AI API for developers · Best AI models of 2026.