Meta Llama Hosted Pricing Comparison (April 2026)
Last updated:
Where can I run Llama and what does it cost? Meta doesn't sell Llama API directly — you pick a third-party host. As of April 2026, Deepinfra is the cheapest for Llama 3.3 70B at $0.23 / $0.40 per 1M tokens and Groq is the fastest at $0.59 / $0.79 per 1M with 250+ tokens/second output. Together AI and Fireworks sit in the middle (~$0.88-$0.90 per 1M for 70B). Replicate is usually the priciest on asymmetric SKUs. Below: full cross-provider pricing for six Llama models.
Key facts about Meta Llama hosted pricing
- Meta does not sell Llama API directly — all API access is via third-party hosts.
- Deepinfra is the cheapest for every size we checked: $0.23 / $0.40 for 70B, $0.80 / $0.80 for 405B, $0.05 / $0.08 for 8B.
- Groq is the fastest at 250-500+ tokens per second on its custom LPU silicon.
- Together AI and Fireworks are within a few cents of each other: ~$0.88-$0.90 per 1M for Llama 3.3 70B.
- Llama 3.1 405B ranges from $0.80 / $0.80 (Deepinfra) to $9.50 / $9.50 (Replicate) per 1M tokens — a 12x spread.
- Llama 3.1 8B is the cheapest tier: $0.05 / $0.08 on Groq and Deepinfra — one of the cheapest production APIs available anywhere.
- Llama 3.2 Vision (11B and 90B) is available on Together, Fireworks, and Groq for multimodal workloads.
- Fine-tuning support varies: Together AI and Fireworks offer LoRA on all sizes; Deepinfra is serving-only.
How much does Llama 3.3 70B cost per million tokens?
| Host | Input / 1M | Output / 1M | Note |
|---|---|---|---|
| Together AI | $0.88 | $0.88 | |
| Fireworks | $0.90 | $0.90 | |
| Groq | $0.59 | $0.79 | fastest |
| Replicate | $0.65 | $2.75 | |
| Deepinfra | $0.23 | $0.40 | cheapest |
How much does Llama 3.1 405B cost per million tokens?
| Host | Input / 1M | Output / 1M | Note |
|---|---|---|---|
| Together AI | $3.50 | $3.50 | |
| Fireworks | $3.00 | $3.00 | |
| Replicate | $9.50 | $9.50 | |
| Deepinfra | $0.80 | $0.80 | cheapest |
How much does Llama 3.1 70B cost per million tokens?
| Host | Input / 1M | Output / 1M | Note |
|---|---|---|---|
| Together AI | $0.88 | $0.88 | |
| Fireworks | $0.90 | $0.90 | |
| Groq | $0.59 | $0.79 | fastest |
| Deepinfra | $0.23 | $0.40 | cheapest |
How much does Llama 3.1 8B cost per million tokens?
| Host | Input / 1M | Output / 1M | Note |
|---|---|---|---|
| Together AI | $0.18 | $0.18 | |
| Fireworks | $0.20 | $0.20 | |
| Groq | $0.05 | $0.08 | cheapest fastest |
| Deepinfra | $0.05 | $0.08 | cheapest |
How much does Llama 3.2 90B Vision cost per million tokens?
| Host | Input / 1M | Output / 1M | Note |
|---|---|---|---|
| Together AI | $1.20 | $1.20 | |
| Fireworks | $1.20 | $1.20 | |
| Groq | $0.90 | $0.90 | cheapest fastest |
How much does Llama 3.2 11B Vision cost per million tokens?
| Host | Input / 1M | Output / 1M | Note |
|---|---|---|---|
| Together AI | $0.18 | $0.18 | cheapest |
| Fireworks | $0.20 | $0.20 | |
| Groq | $0.18 | $0.18 | cheapest fastest |
Which host should I pick for Llama in production?
The decision is a three-way trade-off between cost, speed, and features. If cost dominates, Deepinfra wins on every Llama tier: $0.23 / $0.40 per 1M for Llama 3.3 70B, $0.80 / $0.80 for the 405B flagship, and $0.05 / $0.08 for 8B. That's roughly 3-4x cheaper than Together AI or Fireworks on 70B and 12x cheaper than Replicate on 405B.
If latency dominates, Groq is the fastest Llama host available — 500+ tokens per second on 8B and 250+ on 70B, thanks to custom LPU silicon that outperforms H100 GPUs by roughly 10x on throughput. The per-token price is slightly higher than Deepinfra ($0.59 vs $0.23 for 70B input) but the speed difference is transformative for voice agents and interactive tools.
If features dominate — fine-tuning, dedicated deployments, enterprise SLAs — Together AI and Fireworks are the better default. Both offer LoRA on all Llama sizes including 405B, BYOC deployment, and production-grade rate limits. Pricing is nearly identical ($0.88-$0.90 per 1M for 70B), so the choice usually comes down to tooling preferences.
When to avoid Replicate for Llama
Replicate's Llama pricing is the highest in our comparison — $9.50 / $9.50 per 1M for Llama 3.1 405B vs $0.80 on Deepinfra. Replicate is optimized for Cog-packaged custom models, not commodity Llama serving. If you're running stock Llama without a custom training pipeline, one of the other four hosts is almost always a better fit.
Price History
Track how Meta Llama hosted pricing has changed over time.
Llama 4 Maverick
Llama 4 Scout
Price history tracking started April 2026. Charts will appear after the first price change is detected.
View pricing changelog →
Frequently asked questions
Where's the cheapest place to run Llama 3.3 70B?
Deepinfra at $0.23 per million input tokens and $0.40 per million output tokens is the cheapest host for Llama 3.3 70B as of April 2026 — roughly 3x cheaper than Together AI ($0.88 / $0.88) and 4x cheaper than Fireworks ($0.90 / $0.90). Groq is slightly more expensive at $0.59 / $0.79 per 1M, but offers 10x faster inference.
Does Meta sell Llama API directly?
No. Meta releases Llama weights under the Llama Community License but does not operate a first-party inference API. To use Llama in production you choose a third-party host: Together AI, Fireworks, Groq, Replicate, Deepinfra, or a cloud provider like AWS Bedrock or Azure AI. Prices and speed vary significantly across hosts.
Which provider hosts Llama fastest?
Groq is the fastest host for Llama models by a wide margin — typically 500+ tokens per second on Llama 3.1 8B and 250+ tokens per second on Llama 3.3 70B, thanks to its custom LPU silicon. Together AI and Fireworks run on H100/H200 GPUs and typically deliver 50-100 tokens per second. For latency-sensitive workloads Groq is usually the right default.
How does Together AI compare to Fireworks for Llama?
Together AI and Fireworks are priced within a few cents of each other across the Llama lineup: Llama 3.3 70B is $0.88 vs $0.90 per 1M, Llama 3.1 8B is $0.18 vs $0.20 per 1M. Fireworks is slightly cheaper on Llama 3.1 405B ($3.00 vs $3.50 per 1M). Both offer fine-tuning, dedicated deployments, and OpenAI-compatible SDKs.
Is Groq really 10x faster for Llama?
Yes, measured in tokens per second. Groq's LPU delivers 500+ TPS on Llama 3.1 8B and 250+ TPS on Llama 3.3 70B versus 50-100 TPS on GPU hosts. End-to-end latency advantage depends on network round-trip and first-token time, but for long generations (500+ output tokens) Groq is genuinely 5-10x faster than Together AI, Fireworks, Replicate, or Deepinfra.
Can I fine-tune Llama 3.1 405B anywhere?
Yes. Together AI and Fireworks both support LoRA fine-tuning on Llama 3.1 405B, typically at $8-$12 per 1M training tokens plus serving costs. Deepinfra supports serving fine-tuned Llama but not training. Replicate supports training via Cog. For full fine-tuning on 405B, Together AI and Fireworks dedicated deployments are the main options.
Methodology
Pricing sourced from the public pricing pages of Together AI, Fireworks, Groq, Replicate, and Deepinfra on . All prices expressed in USD per 1 million tokens. We compare 6 Llama models across up to 5 hosted providers. Meta does not operate a first-party Llama API — see llama.com for the weights license.
Compare Llama hosts and providers
Further reading: Full AI API pricing comparison · Cheapest AI APIs in 2026.