Together.ai API Pricing (May 2026) — DeepSeek V3.1 $0.60, GPT-OSS 20B $0.05, Kimi K2.6 $1.20 / 1M tokens
Last updated:
How much does Together AI cost? Together AI API pricing spans $0.05 to $9.00 per million tokens across its open-model catalog. The current flagship is DeepSeek V3.1 at $0.60 / $1.70 per 1M; reasoning workloads run on DeepSeek V4 Pro at $2.10 / $4.40 or DeepSeek R1 at $3.00 / $7.00. The cheapest hosted model is GPT-OSS 20B at $0.05 / $0.20 per 1M, with GPT-OSS 120B at $0.15 / $0.60. New 2026 additions include Kimi K2.6 ($1.20 / $4.50), GLM 5.1 ($1.40 / $4.40), Qwen 3.6 Plus ($0.50 / $3.00), and MiniMax M2.7 ($0.30 / $1.20). Together AI is a neutral open-model host with LoRA fine-tuning, dedicated deployments, and an OpenAI-compatible SDK.
Key facts about Together AI pricing
- DeepSeek V3.1 — current flagship at $0.60 / $1.70 per 1M tokens (replaces legacy V3 at $1.25 flat).
- DeepSeek V4 Pro — 2026 reasoning-grade release at $2.10 / $4.40 per 1M tokens.
- DeepSeek R1 reasoning model at $3.00 / $7.00 per 1M tokens.
- GPT-OSS 20B — cheapest hosted model at $0.05 / $0.20 per 1M tokens.
- GPT-OSS 120B at $0.15 / $0.60 per 1M tokens — premium open-weight at sub-Llama cost.
- Kimi K2.6 long-context model at $1.20 / $4.50 per 1M tokens.
- GLM 5.1 at $1.40 / $4.40 per 1M tokens; Qwen 3.6 Plus at $0.50 / $3.00; MiniMax M2.7 at $0.30 / $1.20.
- Llama 3.3 70B still available at $0.88 / $0.88 per 1M tokens — multiple times cheaper than frontier closed models on output.
- $5 free signup credit; pay-as-you-go beyond, no monthly minimums.
- LoRA fine-tuning supported on Llama, Mistral, Qwen and DeepSeek families.
How much does each Together AI model cost per million tokens?
| Model↕ | Provider↕ | Input $/1M↑ | Cached $/1M↕ | Output $/1M↕ |
|---|---|---|---|---|
GPT-OSS 120B (Together) GPT-OSS | together | $0.15 | — | $0.60 |
MiniMax M2.7 (Together) MiniMax M2 | together | $0.30 | $0.06 | $1.20 |
Llama 3.3 70B (Together) Llama 3.3 | together | $0.88 | — | $0.88 |
Kimi K2.6 (Together) Kimi K2 | together | $1.20 | $0.20 | $4.50 |
Qwen3.7-Max (Together) Qwen3.7 | together | $1.25 | $0.13 | $3.75 |
GLM-5.1 (Together) GLM-5 | together | $1.40 | — | $4.40 |
DeepSeek V4 Pro (Together) DeepSeek V4 | together | $2.10 | $0.20 | $4.40 |
- togetherGPT-OSS 120B (Together)GPT-OSS
- Input
- $0.15
- Cached
- —
- Output
- $0.60
- togetherMiniMax M2.7 (Together)MiniMax M2
- Input
- $0.30
- Cached
- $0.06
- Output
- $1.20
- togetherLlama 3.3 70B (Together)Llama 3.3
- Input
- $0.88
- Cached
- —
- Output
- $0.88
- togetherKimi K2.6 (Together)Kimi K2
- Input
- $1.20
- Cached
- $0.20
- Output
- $4.50
- togetherQwen3.7-Max (Together)Qwen3.7
- Input
- $1.25
- Cached
- $0.13
- Output
- $3.75
- togetherGLM-5.1 (Together)GLM-5
- Input
- $1.40
- Cached
- —
- Output
- $4.40
- togetherDeepSeek V4 Pro (Together)DeepSeek V4
- Input
- $2.10
- Cached
- $0.20
- Output
- $4.40
Last synced:
Why choose Together AI over Groq or Fireworks?
Together AI's value is breadth, fine-tuning, and production features. The catalog covers Llama (all sizes), Mistral, Qwen, DeepSeek, plus long-tail specialty models that Groq and Fireworks don't carry. Pricing is competitive: Llama 3.3 70B at $0.88 / $0.88 per 1M matches Fireworks within a cent, and DeepSeek V3 at $1.25 / $1.25 is about 2x more than DeepSeek's own API but gives you a US-based host with consistent latency.
LoRA fine-tuning is the killer feature. Together AI supports fine-tuning on every major Llama, Mistral, and Qwen size including the 405B flagship, at roughly $8-$12 per million training tokens. Inference on fine-tuned adapters runs at standard rates plus a small LoRA overhead, which is dramatically cheaper than dedicated deployments. For teams iterating on custom behavior, this is the cleanest path.
Production features include dedicated deployments (guaranteed throughput + lower latency), BYOC options on AWS, and an OpenAI-compatible SDK so migration from OpenAI is usually a base-URL change.
When Together is not the right choice
Groq is 5-10x faster on the same Llama models — if latency is the critical path, Groq wins. Fireworks is slightly cheaper on Llama 3.1 405B. And for frontier reasoning and code quality, GPT-5.4 and Claude 3.5 still lead by a clear margin.
Price History
Track how Together AI API pricing has changed over time.
DeepSeek R1 (Together)
DeepSeek V3 (Together)
DeepSeek V3.1 (Together)
Llama 3.1 405B (Together)
Llama 3.3 70B (Together)
Mistral Large (Together)
Mixtral 8x22B (Together)
Qwen 2.5 72B (Together)
GPT-OSS 120B (Together)
GPT-OSS 20B (Together)
DeepSeek V4 Pro (Together)
MiniMax M2.7 (Together)
Kimi K2.6 (Together)
GLM-5.1 (Together)
Qwen3.6-Plus (Together)
Qwen3.7-Max (Together)
Price history tracking started April 2026. Charts will appear after the first price change is detected.
View pricing changelog →
Frequently asked questions
How much does Together AI charge per token?
Together AI pricing spans $0.05 to $9.00 per million tokens across its open-model catalog. The cheapest model is GPT-OSS 20B at $0.05 / $0.20 per 1M. The current DeepSeek flagship V3.1 is $0.60 / $1.70 per 1M; DeepSeek V4 Pro is $2.10 / $4.40; DeepSeek R1 is $3.00 / $7.00. Llama 3.3 70B is $0.88 flat per 1M. New 2026 additions like Kimi K2.6 ($1.20 / $4.50), GLM 5.1 ($1.40 / $4.40), Qwen 3.6 Plus ($0.50 / $3.00) and MiniMax M2.7 ($0.30 / $1.20) are all available. Together AI uses flat per-token pricing with no cached-input discount.
Does Together AI have a free tier?
Yes. Together AI offers $5 in free credits on signup (typically enough for several million tokens of prototyping on smaller Llama or Qwen models). Beyond that, pricing is pay-as-you-go per token with no monthly minimums. Dedicated deployments have their own pricing tier.
How does Together compare to Fireworks?
Together AI and Fireworks are the two main "neutral" open-model hosts and price within a few cents of each other on most SKUs. Llama 3.3 70B is $0.88 on Together vs $0.90 on Fireworks. Fireworks is slightly cheaper on Llama 3.1 405B ($3.00 vs $3.50). Together generally has a broader catalog of smaller and specialty models; Fireworks emphasizes inference speed and function calling.
What's Together's cheapest Llama option?
Llama 3.1 8B is Together AI's cheapest Llama at $0.18 per million input and output tokens. For the 70B tier, Llama 3.3 70B at $0.88 / $0.88 per 1M is the best cost/quality. Mixtral 8x22B at $1.20 / $1.20 is a common sparse MoE alternative if you want lower per-token cost at higher quality than dense 70B.
Does Together support fine-tuning?
Yes. Together AI supports LoRA fine-tuning on most Llama, Mistral, and Qwen models including Llama 3.1 405B. Typical training cost is around $8-$12 per million training tokens depending on model size, with hosted serving of the fine-tuned model at standard rates plus a small LoRA overhead.
What's DeepSeek R1's context window on Together?
DeepSeek R1 on Together AI supports a 64K token context window at $3.00 / $7.00 per 1M tokens. DeepSeek V3 ships with 128K context at $1.25 flat per 1M. If you need longer context, DeepSeek's own API sometimes ships extended context on preview endpoints cheaper than Together AI.
Methodology
Pricing sourced from https://api.together.ai/models on . All prices expressed in USD per 1 million tokens. We track 16 Together AI models spanning DeepSeek (V3.1, V4 Pro, R1), GPT-OSS (20B, 120B), Llama 3.x, Qwen 3.6, Kimi K2.6, GLM 5.1, MiniMax, Mixtral and Mistral catalogs.
Compare Together AI to other providers
Further reading: Full AI API pricing comparison · Best AI models of 2026.