Together AI API Pricing (April 2026) — DeepSeek, Llama, Qwen
Last updated:
How much does Together AI cost? Together AI API pricing spans $0.88 to $9.00 per million tokens across its open-model catalog. DeepSeek V3 is $1.25 / $1.25 per 1M, DeepSeek R1 is $3.00 / $7.00, Llama 3.3 70B is $0.88 / $0.88, and Llama 3.1 405B is $3.50 / $3.50. Qwen 2.5 72B and Mixtral 8x22B both sit at $1.20 / $1.20 per 1M. Together AI is a neutral open-model host with fine-tuning, dedicated deployments, and an OpenAI-compatible SDK.
Key facts about Together AI pricing
- DeepSeek V3 flat-priced at $1.25 / $1.25 per 1M tokens with 128K context.
- DeepSeek R1 reasoning model at $3.00 / $7.00 per 1M tokens.
- Llama 3.3 70B at $0.88 / $0.88 per 1M tokens — 4x cheaper than GPT-5.4 on output.
- Llama 3.1 405B flagship at $3.50 / $3.50 per 1M tokens.
- Qwen 2.5 72B at $1.20 / $1.20 per 1M tokens — strong multilingual coverage.
- Mixtral 8x22B sparse MoE at $1.20 / $1.20 per 1M tokens.
- $5 free signup credit; pay-as-you-go beyond with no monthly minimums.
- LoRA fine-tuning supported on all major Llama, Mistral, and Qwen sizes including 405B.
How much does each Together AI model cost per million tokens?
| Model↕ | Provider↕ | Input $/1M↑ | Cached $/1M↕ | Output $/1M↕ |
|---|---|---|---|---|
Llama 3.3 70B (Together) Llama 3.3 | together | $0.88 | — | $0.88 |
Qwen 2.5 72B (Together) Qwen | together | $1.20 | — | $1.20 |
Mixtral 8x22B (Together) Mixtral | together | $1.20 | — | $1.20 |
DeepSeek V3 (Together) DeepSeek | together | $1.25 | — | $1.25 |
Mistral Large (Together) Mistral | together | $3.00 | — | $9.00 |
Llama 3.1 405B (Together) Llama 3.1 | together | $3.50 | — | $3.50 |
- togetherLlama 3.3 70B (Together)Llama 3.3
- Input
- $0.88
- Cached
- —
- Output
- $0.88
- togetherQwen 2.5 72B (Together)Qwen
- Input
- $1.20
- Cached
- —
- Output
- $1.20
- togetherMixtral 8x22B (Together)Mixtral
- Input
- $1.20
- Cached
- —
- Output
- $1.20
- togetherDeepSeek V3 (Together)DeepSeek
- Input
- $1.25
- Cached
- —
- Output
- $1.25
- togetherMistral Large (Together)Mistral
- Input
- $3.00
- Cached
- —
- Output
- $9.00
- togetherLlama 3.1 405B (Together)Llama 3.1
- Input
- $3.50
- Cached
- —
- Output
- $3.50
Last synced:
Why choose Together AI over Groq or Fireworks?
Together AI's value is breadth, fine-tuning, and production features. The catalog covers Llama (all sizes), Mistral, Qwen, DeepSeek, plus long-tail specialty models that Groq and Fireworks don't carry. Pricing is competitive: Llama 3.3 70B at $0.88 / $0.88 per 1M matches Fireworks within a cent, and DeepSeek V3 at $1.25 / $1.25 is about 2x more than DeepSeek's own API but gives you a US-based host with consistent latency.
LoRA fine-tuning is the killer feature. Together AI supports fine-tuning on every major Llama, Mistral, and Qwen size including the 405B flagship, at roughly $8-$12 per million training tokens. Inference on fine-tuned adapters runs at standard rates plus a small LoRA overhead, which is dramatically cheaper than dedicated deployments. For teams iterating on custom behavior, this is the cleanest path.
Production features include dedicated deployments (guaranteed throughput + lower latency), BYOC options on AWS, and an OpenAI-compatible SDK so migration from OpenAI is usually a base-URL change.
When Together is not the right choice
Groq is 5-10x faster on the same Llama models — if latency is the critical path, Groq wins. Fireworks is slightly cheaper on Llama 3.1 405B. And for frontier reasoning and code quality, GPT-5.4 and Claude 3.5 still lead by a clear margin.
Price History
Track how Together AI API pricing has changed over time.
DeepSeek R1 (Together)
No history yet — first snapshot 2026-04-16. Price trends will appear here as data accumulates.
DeepSeek V3 (Together)
No history yet — first snapshot 2026-04-16. Price trends will appear here as data accumulates.
Llama 3.3 70B (Together)
No history yet — first snapshot 2026-04-16. Price trends will appear here as data accumulates.
Llama 3.1 405B (Together)
No history yet — first snapshot 2026-04-16. Price trends will appear here as data accumulates.
Qwen 2.5 72B (Together)
No history yet — first snapshot 2026-04-16. Price trends will appear here as data accumulates.
Mixtral 8x22B (Together)
No history yet — first snapshot 2026-04-16. Price trends will appear here as data accumulates.
Mistral Large (Together)
No history yet — first snapshot 2026-04-16. Price trends will appear here as data accumulates.
Price history tracking started April 2026. Charts will appear after the first price change is detected.
View pricing changelog →
Frequently asked questions
How much does Together AI charge per token?
Together AI pricing ranges from $0.88 per million input tokens for Llama 3.3 70B up to $9.00 per million output tokens for Mistral Large. DeepSeek V3 is flat-priced at $1.25 / $1.25 per 1M, Llama 3.1 405B at $3.50 / $3.50, and DeepSeek R1 at $3.00 / $7.00 per 1M. Together AI uses flat input/output pricing on most SKUs with no cached-input discount.
Does Together AI have a free tier?
Yes. Together AI offers $5 in free credits on signup (typically enough for several million tokens of prototyping on smaller Llama or Qwen models). Beyond that, pricing is pay-as-you-go per token with no monthly minimums. Dedicated deployments have their own pricing tier.
How does Together compare to Fireworks?
Together AI and Fireworks are the two main "neutral" open-model hosts and price within a few cents of each other on most SKUs. Llama 3.3 70B is $0.88 on Together vs $0.90 on Fireworks. Fireworks is slightly cheaper on Llama 3.1 405B ($3.00 vs $3.50). Together generally has a broader catalog of smaller and specialty models; Fireworks emphasizes inference speed and function calling.
What's Together's cheapest Llama option?
Llama 3.1 8B is Together AI's cheapest Llama at $0.18 per million input and output tokens. For the 70B tier, Llama 3.3 70B at $0.88 / $0.88 per 1M is the best cost/quality. Mixtral 8x22B at $1.20 / $1.20 is a common sparse MoE alternative if you want lower per-token cost at higher quality than dense 70B.
Does Together support fine-tuning?
Yes. Together AI supports LoRA fine-tuning on most Llama, Mistral, and Qwen models including Llama 3.1 405B. Typical training cost is around $8-$12 per million training tokens depending on model size, with hosted serving of the fine-tuned model at standard rates plus a small LoRA overhead.
What's DeepSeek R1's context window on Together?
DeepSeek R1 on Together AI supports a 64K token context window at $3.00 / $7.00 per 1M tokens. DeepSeek V3 ships with 128K context at $1.25 flat per 1M. If you need longer context, DeepSeek's own API sometimes ships extended context on preview endpoints cheaper than Together AI.
Methodology
Pricing sourced from https://api.together.ai/models on . All prices expressed in USD per 1 million tokens. We track 7 Together AI models spanning DeepSeek, Llama, Qwen, Mixtral, and Mistral catalogs.
Compare Together AI to other providers
Further reading: Full AI API pricing comparison · Best AI models of 2026.