Together.ai API Pricing (May 2026) — DeepSeek V3.1 $0.60, GPT-OSS 20B $0.05, Kimi K2.6 $1.20 / 1M tokens

Last updated: 2026-07-17

How much does Together AI cost? Together AI API pricing spans $0.05 to $9.00 per million tokens across its open-model catalog. The current flagship is DeepSeek V3.1 at $0.60 / $1.70 per 1M; reasoning workloads run on DeepSeek V4 Pro at $2.10 / $4.40 or DeepSeek R1 at $3.00 / $7.00. The cheapest hosted model is GPT-OSS 20B at $0.05 / $0.20 per 1M, with GPT-OSS 120B at $0.15 / $0.60. New 2026 additions include Kimi K2.6 ($1.20 / $4.50), GLM 5.1 ($1.40 / $4.40), Qwen 3.6 Plus ($0.50 / $3.00), and MiniMax M2.7 ($0.30 / $1.20). Together AI is a neutral open-model host with LoRA fine-tuning, dedicated deployments, and an OpenAI-compatible SDK.

Key facts about Together AI pricing

DeepSeek V3.1 — current flagship at $0.60 / $1.70 per 1M tokens (replaces legacy V3 at $1.25 flat).
DeepSeek V4 Pro — 2026 reasoning-grade release at $2.10 / $4.40 per 1M tokens.
DeepSeek R1 reasoning model at $3.00 / $7.00 per 1M tokens.
GPT-OSS 20B — cheapest hosted model at $0.05 / $0.20 per 1M tokens.
GPT-OSS 120B at $0.15 / $0.60 per 1M tokens — premium open-weight at sub-Llama cost.
Kimi K2.6 long-context model at $1.20 / $4.50 per 1M tokens.
GLM 5.1 at $1.40 / $4.40 per 1M tokens; Qwen 3.6 Plus at $0.50 / $3.00; MiniMax M2.7 at $0.30 / $1.20.
Llama 3.3 70B still available at $0.88 / $0.88 per 1M tokens — multiple times cheaper than frontier closed models on output.
$5 free signup credit; pay-as-you-go beyond, no monthly minimums.
LoRA fine-tuning supported on Llama, Mistral, Qwen and DeepSeek families.

How much does each Together AI model cost per million tokens?

Comparing open-model hosts? Novita is a managed multi-model option to quote alongside Together AI, not a replacement for checking the exact model rate below. For GPU rental math, see the self-hosting break-even guide.

Affiliate disclosure: this sponsored link may earn us a commission. It does not affect Together AI table order or pricing data.

Show legacy models

Showing 23 current models. .

Model↕	Provider↕	Tier↕	Input $/1M↑	Cached $/1M↕	Output $/1M↕
GPT-OSS 20B (Together) GPT-OSS	together	Low	$0.05	—	$0.20
Gemma 3n E4B Instruct (Together) Gemma 3n	together	Low	$0.06	—	$0.12
Together Llama 3 8B Instruct Lite Llama 3	together	Low	$0.14	—	$0.14
GPT-OSS 120B (Together) GPT-OSS	together	Low	$0.15	—	$0.60
Rnj-1 Instruct (Together) Rnj	together	Low	$0.15	—	$0.15
Qwen3.5 9B (Together) Qwen3.5	together	Low	$0.17	—	$0.25
Together Qwen3 235B A22B Instruct 2507 Qwen3	together	Low	$0.20	—	$0.60
Together Gemma 4 31B IT Pearl Gemma 4	together	Low	$0.28	—	$0.86
MiniMax M2.7 (Together) MiniMax M2	together	Low	$0.30	$0.06	$1.20
Together MiniMax M3 MiniMax M3	together	Low	$0.30	$0.06	$1.20
Together MiniMax M2.5 MiniMax M2.5	together	Low	$0.30	$0.06	$1.20
Together Qwen2.5 7B Instruct Turbo Qwen2.5	together	Low	$0.30	—	$0.30
Together Qwen3.7 Plus Qwen3.7	together	Low	$0.32	—	$1.28
Together Gemma 4 31B IT Gemma 4	together	Low	$0.39	—	$0.97
Qwen3.5 397B A17B (Together) Qwen3.5	together	Low	$0.60	$0.35	$3.60
Together Nemotron 3 Ultra 550B A55B Nemotron 3	together	Low	$0.60	$0.20	$3.60
Together Kimi K2.7 Code Kimi K2.7	together	Low	$0.95	$0.19	$4.00
Llama 3.3 70B (Together) Llama 3.3	together	Mid	$1.04	—	$1.04
Kimi K2.6 (Together) Kimi K2	together	Mid	$1.20	$0.20	$4.50
Qwen3.7-Max (Together) Qwen3.7	together	Mid	$1.25	$0.13	$3.75
Cogito v2.1 671B (Together) Cogito	together	Mid	$1.25	—	$1.25
GLM-5.1 (Together) GLM-5	together	Mid	$1.40	$0.26	$4.40
Together DeepSeek V4 Pro DeepSeek V4	together	High	$1.74	$0.20	$3.48

Showing 23 of 33 models · All prices in USD per 1M tokensLast synced: 2026-07-17

GPT-OSS 20B (Together)
GPT-OSS
togetherLow
Input
$0.05
Cached
—
Output
$0.20
Gemma 3n E4B Instruct (Together)
Gemma 3n
togetherLow
Input
$0.06
Cached
—
Output
$0.12
Together Llama 3 8B Instruct Lite
Llama 3
togetherLow
Input
$0.14
Cached
—
Output
$0.14
GPT-OSS 120B (Together)
GPT-OSS
togetherLow
Input
$0.15
Cached
—
Output
$0.60
Rnj-1 Instruct (Together)
Rnj
togetherLow
Input
$0.15
Cached
—
Output
$0.15
Qwen3.5 9B (Together)
Qwen3.5
togetherLow
Input
$0.17
Cached
—
Output
$0.25
Together Qwen3 235B A22B Instruct 2507
Qwen3
togetherLow
Input
$0.20
Cached
—
Output
$0.60
Together Gemma 4 31B IT Pearl
Gemma 4
togetherLow
Input
$0.28
Cached
—
Output
$0.86
MiniMax M2.7 (Together)
MiniMax M2
togetherLow
Input
$0.30
Cached
$0.06
Output
$1.20
Together MiniMax M3
MiniMax M3
togetherLow
Input
$0.30
Cached
$0.06
Output
$1.20
Together MiniMax M2.5
MiniMax M2.5
togetherLow
Input
$0.30
Cached
$0.06
Output
$1.20
Together Qwen2.5 7B Instruct Turbo
Qwen2.5
togetherLow
Input
$0.30
Cached
—
Output
$0.30
Together Qwen3.7 Plus
Qwen3.7
togetherLow
Input
$0.32
Cached
—
Output
$1.28
Together Gemma 4 31B IT
Gemma 4
togetherLow
Input
$0.39
Cached
—
Output
$0.97
Qwen3.5 397B A17B (Together)
Qwen3.5
togetherLow
Input
$0.60
Cached
$0.35
Output
$3.60
Together Nemotron 3 Ultra 550B A55B
Nemotron 3
togetherLow
Input
$0.60
Cached
$0.20
Output
$3.60
Together Kimi K2.7 Code
Kimi K2.7
togetherLow
Input
$0.95
Cached
$0.19
Output
$4.00
Llama 3.3 70B (Together)
Llama 3.3
togetherMid
Input
$1.04
Cached
—
Output
$1.04
Kimi K2.6 (Together)
Kimi K2
togetherMid
Input
$1.20
Cached
$0.20
Output
$4.50
Qwen3.7-Max (Together)
Qwen3.7
togetherMid
Input
$1.25
Cached
$0.13
Output
$3.75
Cogito v2.1 671B (Together)
Cogito
togetherMid
Input
$1.25
Cached
—
Output
$1.25
GLM-5.1 (Together)
GLM-5
togetherMid
Input
$1.40
Cached
$0.26
Output
$4.40
Together DeepSeek V4 Pro
DeepSeek V4
togetherHigh
Input
$1.74
Cached
$0.20
Output
$3.48

Showing 23 of 33 models · USD per 1M tokens

Last synced: 2026-07-17

Why choose Together AI over Groq or Fireworks?

Together AI's value is breadth, fine-tuning, and production features. The catalog covers Llama (all sizes), Mistral, Qwen, DeepSeek, plus long-tail specialty models that Groq and Fireworks don't carry. Pricing is competitive: Llama 3.3 70B at $0.88 / $0.88 per 1M matches Fireworks within a cent, and DeepSeek V3 at $1.25 / $1.25 is about 2x more than DeepSeek's own API but gives you a US-based host with consistent latency.

LoRA fine-tuning is the killer feature. Together AI supports fine-tuning on every major Llama, Mistral, and Qwen size including the 405B flagship, at roughly $8-$12 per million training tokens. Inference on fine-tuned adapters runs at standard rates plus a small LoRA overhead, which is dramatically cheaper than dedicated deployments. For teams iterating on custom behavior, this is the cleanest path.

Production features include dedicated deployments (guaranteed throughput + lower latency), BYOC options on AWS, and an OpenAI-compatible SDK so migration from OpenAI is usually a base-URL change.

When Together is not the right choice

Groq is 5-10x faster on the same Llama models — if latency is the critical path, Groq wins. Fireworks is slightly cheaper on Llama 3.1 405B. And for frontier reasoning and code quality, GPT-5.4 and Claude 3.5 still lead by a clear margin.

Price History

Track how Together AI API pricing has changed over time.

DeepSeek R1 (Together)

DeepSeek V3 (Together)

DeepSeek V3.1 (Together)

Llama 3.1 405B (Together)

Llama 3.3 70B (Together)

Mistral Large (Together)

Mixtral 8x22B (Together)

Qwen 2.5 72B (Together)

GPT-OSS 120B (Together)

GPT-OSS 20B (Together)

Together DeepSeek V4 Pro

MiniMax M2.7 (Together)

Kimi K2.6 (Together)

GLM-5.1 (Together)

Qwen3.6-Plus (Together)

Qwen3.7-Max (Together)

LFM2 24B A2B (Together)

Qwen3.5 397B A17B (Together)

GLM-5 (Together)

Qwen3.5 9B (Together)

Cogito v2.1 671B (Together)

Rnj-1 Instruct (Together)

Gemma 3n E4B Instruct (Together)

Together MiniMax M3

Together MiniMax M2.5

Together Kimi K2.7 Code

Together Qwen3.7 Plus

Together Gemma 4 31B IT

Together Gemma 4 31B IT Pearl

Together Nemotron 3 Ultra 550B A55B

Together Qwen3 235B A22B Instruct 2507

Together Qwen2.5 7B Instruct Turbo

Together Llama 3 8B Instruct Lite

Price history tracking started April 2026. Charts will appear after the first price change is detected.
View pricing changelog →

Frequently asked questions

How much does Together AI charge per token?

Together AI pricing spans $0.05 to $9.00 per million tokens across its open-model catalog. The cheapest model is GPT-OSS 20B at $0.05 / $0.20 per 1M. The current DeepSeek flagship V3.1 is $0.60 / $1.70 per 1M; DeepSeek V4 Pro is $2.10 / $4.40; DeepSeek R1 is $3.00 / $7.00. Llama 3.3 70B is $0.88 flat per 1M. New 2026 additions like Kimi K2.6 ($1.20 / $4.50), GLM 5.1 ($1.40 / $4.40), Qwen 3.6 Plus ($0.50 / $3.00) and MiniMax M2.7 ($0.30 / $1.20) are all available. Together AI uses flat per-token pricing with no cached-input discount.

Does Together AI have a free tier?

Yes. Together AI offers $5 in free credits on signup (typically enough for several million tokens of prototyping on smaller Llama or Qwen models). Beyond that, pricing is pay-as-you-go per token with no monthly minimums. Dedicated deployments have their own pricing tier.

How does Together compare to Fireworks?

Together AI and Fireworks are the two main "neutral" open-model hosts and price within a few cents of each other on most SKUs. Llama 3.3 70B is $0.88 on Together vs $0.90 on Fireworks. Fireworks is slightly cheaper on Llama 3.1 405B ($3.00 vs $3.50). Together generally has a broader catalog of smaller and specialty models; Fireworks emphasizes inference speed and function calling.

What's Together's cheapest Llama option?

Llama 3.1 8B is Together AI's cheapest Llama at $0.18 per million input and output tokens. For the 70B tier, Llama 3.3 70B at $0.88 / $0.88 per 1M is the best cost/quality. Mixtral 8x22B at $1.20 / $1.20 is a common sparse MoE alternative if you want lower per-token cost at higher quality than dense 70B.

Does Together support fine-tuning?

Yes. Together AI supports LoRA fine-tuning on most Llama, Mistral, and Qwen models including Llama 3.1 405B. Typical training cost is around $8-$12 per million training tokens depending on model size, with hosted serving of the fine-tuned model at standard rates plus a small LoRA overhead.

What's DeepSeek R1's context window on Together?

DeepSeek R1 on Together AI supports a 64K token context window at $3.00 / $7.00 per 1M tokens. DeepSeek V3 ships with 128K context at $1.25 flat per 1M. If you need longer context, DeepSeek's own API sometimes ships extended context on preview endpoints cheaper than Together AI.

Methodology

Pricing sourced from https://api.together.ai/models on 2026-07-17. All prices expressed in USD per 1 million tokens. We track 33 Together AI models spanning DeepSeek (V3.1, V4 Pro, R1), GPT-OSS (20B, 120B), Llama 3.x, Qwen 3.6, Kimi K2.6, GLM 5.1, MiniMax, Mixtral and Mistral catalogs.

Compare Together AI to other providers

View All Models Groq Pricing Meta Llama Pricing DeepSeek Pricing Token Calculator

Further reading: Full AI API pricing comparison · Best AI models of 2026.