Together AI API Pricing (April 2026) — DeepSeek, Llama, Qwen

Last updated: 2026-04-15

How much does Together AI cost? Together AI API pricing spans $0.88 to $9.00 per million tokens across its open-model catalog. DeepSeek V3 is $1.25 / $1.25 per 1M, DeepSeek R1 is $3.00 / $7.00, Llama 3.3 70B is $0.88 / $0.88, and Llama 3.1 405B is $3.50 / $3.50. Qwen 2.5 72B and Mixtral 8x22B both sit at $1.20 / $1.20 per 1M. Together AI is a neutral open-model host with fine-tuning, dedicated deployments, and an OpenAI-compatible SDK.

Key facts about Together AI pricing

DeepSeek V3 flat-priced at $1.25 / $1.25 per 1M tokens with 128K context.
DeepSeek R1 reasoning model at $3.00 / $7.00 per 1M tokens.
Llama 3.3 70B at $0.88 / $0.88 per 1M tokens — 4x cheaper than GPT-5.4 on output.
Llama 3.1 405B flagship at $3.50 / $3.50 per 1M tokens.
Qwen 2.5 72B at $1.20 / $1.20 per 1M tokens — strong multilingual coverage.
Mixtral 8x22B sparse MoE at $1.20 / $1.20 per 1M tokens.
$5 free signup credit; pay-as-you-go beyond with no monthly minimums.
LoRA fine-tuning supported on all major Llama, Mistral, and Qwen sizes including 405B.

How much does each Together AI model cost per million tokens?

Show legacy models

Model↕	Provider↕	Input $/1M↑	Cached $/1M↕	Output $/1M↕
Llama 3.3 70B (Together) Llama 3.3	together	$0.88	—	$0.88
Qwen 2.5 72B (Together) Qwen	together	$1.20	—	$1.20
Mixtral 8x22B (Together) Mixtral	together	$1.20	—	$1.20
DeepSeek V3 (Together) DeepSeek	together	$1.25	—	$1.25
Mistral Large (Together) Mistral	together	$3.00	—	$9.00
Llama 3.1 405B (Together) Llama 3.1	together	$3.50	—	$3.50

Showing 6 of 7 models · All prices in USD per 1M tokensLast synced: 2026-04-15

Llama 3.3 70B (Together)
Llama 3.3
together
Input
$0.88
Cached
—
Output
$0.88
Qwen 2.5 72B (Together)
Qwen
together
Input
$1.20
Cached
—
Output
$1.20
Mixtral 8x22B (Together)
Mixtral
together
Input
$1.20
Cached
—
Output
$1.20
DeepSeek V3 (Together)
DeepSeek
together
Input
$1.25
Cached
—
Output
$1.25
Mistral Large (Together)
Mistral
together
Input
$3.00
Cached
—
Output
$9.00
Llama 3.1 405B (Together)
Llama 3.1
together
Input
$3.50
Cached
—
Output
$3.50

Showing 6 of 7 models · USD per 1M tokens

Last synced: 2026-04-15

Why choose Together AI over Groq or Fireworks?

Together AI's value is breadth, fine-tuning, and production features. The catalog covers Llama (all sizes), Mistral, Qwen, DeepSeek, plus long-tail specialty models that Groq and Fireworks don't carry. Pricing is competitive: Llama 3.3 70B at $0.88 / $0.88 per 1M matches Fireworks within a cent, and DeepSeek V3 at $1.25 / $1.25 is about 2x more than DeepSeek's own API but gives you a US-based host with consistent latency.

LoRA fine-tuning is the killer feature. Together AI supports fine-tuning on every major Llama, Mistral, and Qwen size including the 405B flagship, at roughly $8-$12 per million training tokens. Inference on fine-tuned adapters runs at standard rates plus a small LoRA overhead, which is dramatically cheaper than dedicated deployments. For teams iterating on custom behavior, this is the cleanest path.

Production features include dedicated deployments (guaranteed throughput + lower latency), BYOC options on AWS, and an OpenAI-compatible SDK so migration from OpenAI is usually a base-URL change.

When Together is not the right choice

Groq is 5-10x faster on the same Llama models — if latency is the critical path, Groq wins. Fireworks is slightly cheaper on Llama 3.1 405B. And for frontier reasoning and code quality, GPT-5.4 and Claude 3.5 still lead by a clear margin.