Meta Llama Hosted Pricing Comparison (April 2026)

Last updated:

Where can I run Llama and what does it cost? Meta doesn't sell Llama API directly — you pick a third-party host. As of April 2026, Deepinfra is the cheapest for Llama 3.3 70B at $0.23 / $0.40 per 1M tokens and Groq is the fastest at $0.59 / $0.79 per 1M with 250+ tokens/second output.

Llama 4 Models (First-Party Pricing)

  • Llama 4 Scout
    Llama 4
    Meta
    Input
    $0.15
    Cached
    Output
    $0.15
Showing 1 of 2 models · USD per 1M tokens
Last synced:

Last synced:

How much does Llama 3.3 70B cost per million tokens?

Host Input / 1M Output / 1M Note
Together AI $0.88 $0.88
Fireworks $0.90 $0.90
Groq $0.59 $0.79 fastest
Replicate $0.65 $2.75
Deepinfra $0.23 $0.40 cheapest

How much does Llama 3.1 405B cost per million tokens?

Host Input / 1M Output / 1M Note
Together AI $3.50 $3.50
Fireworks $3.00 $3.00
Replicate $9.50 $9.50
Deepinfra $0.80 $0.80 cheapest

How much does Llama 3.1 8B cost per million tokens?

Host Input / 1M Output / 1M Note
Together AI $0.18 $0.18
Fireworks $0.20 $0.20
Groq $0.05 $0.08 cheapest fastest
Deepinfra $0.05 $0.08 cheapest

Price History

Track how Meta Llama hosted pricing has changed over time.

Llama 4 Maverick

Llama 4 Scout

Price history tracking started April 2026. Charts will appear after the first price change is detected.
View pricing changelog →

Frequently asked questions

Where's the cheapest place to run Llama 3.3 70B?

Deepinfra at $0.23 per million input tokens and $0.40 per million output tokens is the cheapest host for Llama 3.3 70B as of April 2026 — roughly 3x cheaper than Together AI ($0.88 / $0.88) and 4x cheaper than Fireworks ($0.90 / $0.90). Groq is slightly more expensive at $0.59 / $0.79 per 1M, but offers 10x faster inference.

Does Meta sell Llama API directly?

No. Meta releases Llama weights under the Llama Community License but does not operate a first-party inference API. To use Llama in production you choose a third-party host: Together AI, Fireworks, Groq, Replicate, Deepinfra, or a cloud provider like AWS Bedrock or Azure AI.

Which provider hosts Llama fastest?

Groq is the fastest host for Llama models by a wide margin — typically 500+ tokens per second on Llama 3.1 8B and 250+ tokens per second on Llama 3.3 70B, thanks to its custom LPU silicon.

How does Together AI compare to Fireworks for Llama?

Together AI and Fireworks are priced within a few cents of each other across the Llama lineup: Llama 3.3 70B is $0.88 vs $0.90 per 1M, Llama 3.1 8B is $0.18 vs $0.20 per 1M. Fireworks is slightly cheaper on Llama 3.1 405B ($3.00 vs $3.50 per 1M).

Methodology

Pricing sourced from the public pricing pages of Together AI, Fireworks, Groq, Replicate, and Deepinfra on . All prices expressed in USD per 1 million tokens. We track 2 first-party Meta models and compare 3 Llama models across multiple hosted providers.