Meta Llama Hosted Pricing Comparison (April 2026)
Last updated:
Where can I run Llama and what does it cost? Meta doesn't sell Llama API directly — you pick a third-party host. As of April 2026, Deepinfra is the cheapest for Llama 3.3 70B at $0.23 / $0.40 per 1M tokens and Groq is the fastest at $0.59 / $0.79 per 1M with 250+ tokens/second output.
Llama 4 Models (First-Party Pricing)
| Model↕ | Provider↕ | Input $/1M↑ | Cached $/1M↕ | Output $/1M↕ |
|---|---|---|---|---|
Llama 4 Scout Llama 4 | Meta | $0.15 | — | $0.15 |
- MetaLlama 4 ScoutLlama 4
- Input
- $0.15
- Cached
- —
- Output
- $0.15
Last synced:
How much does Llama 3.3 70B cost per million tokens?
| Host | Input / 1M | Output / 1M | Note |
|---|---|---|---|
| Together AI | $0.88 | $0.88 | |
| Fireworks | $0.90 | $0.90 | |
| Groq | $0.59 | $0.79 | fastest |
| Replicate | $0.65 | $2.75 | |
| Deepinfra | $0.23 | $0.40 | cheapest |
How much does Llama 3.1 405B cost per million tokens?
| Host | Input / 1M | Output / 1M | Note |
|---|---|---|---|
| Together AI | $3.50 | $3.50 | |
| Fireworks | $3.00 | $3.00 | |
| Replicate | $9.50 | $9.50 | |
| Deepinfra | $0.80 | $0.80 | cheapest |
How much does Llama 3.1 8B cost per million tokens?
| Host | Input / 1M | Output / 1M | Note |
|---|---|---|---|
| Together AI | $0.18 | $0.18 | |
| Fireworks | $0.20 | $0.20 | |
| Groq | $0.05 | $0.08 | cheapest fastest |
| Deepinfra | $0.05 | $0.08 | cheapest |
Price History
Track how Meta Llama hosted pricing has changed over time.
Llama 4 Maverick
Llama 4 Scout
Price history tracking started April 2026. Charts will appear after the first price change is detected.
View pricing changelog →
Frequently asked questions
Where's the cheapest place to run Llama 3.3 70B?
Deepinfra at $0.23 per million input tokens and $0.40 per million output tokens is the cheapest host for Llama 3.3 70B as of April 2026 — roughly 3x cheaper than Together AI ($0.88 / $0.88) and 4x cheaper than Fireworks ($0.90 / $0.90). Groq is slightly more expensive at $0.59 / $0.79 per 1M, but offers 10x faster inference.
Does Meta sell Llama API directly?
No. Meta releases Llama weights under the Llama Community License but does not operate a first-party inference API. To use Llama in production you choose a third-party host: Together AI, Fireworks, Groq, Replicate, Deepinfra, or a cloud provider like AWS Bedrock or Azure AI.
Which provider hosts Llama fastest?
Groq is the fastest host for Llama models by a wide margin — typically 500+ tokens per second on Llama 3.1 8B and 250+ tokens per second on Llama 3.3 70B, thanks to its custom LPU silicon.
How does Together AI compare to Fireworks for Llama?
Together AI and Fireworks are priced within a few cents of each other across the Llama lineup: Llama 3.3 70B is $0.88 vs $0.90 per 1M, Llama 3.1 8B is $0.18 vs $0.20 per 1M. Fireworks is slightly cheaper on Llama 3.1 405B ($3.00 vs $3.50 per 1M).
Methodology
Pricing sourced from the public pricing pages of Together AI, Fireworks, Groq, Replicate, and Deepinfra on . All prices expressed in USD per 1 million tokens. We track 2 first-party Meta models and compare 3 Llama models across multiple hosted providers.
Compare Llama hosts and providers
Further reading: Full AI API pricing comparison · Cheapest AI APIs in 2026.