Together AI API Pricing (April 2026) — DeepSeek, Llama, Qwen

Last updated:

How much does Together AI cost? Together AI API pricing spans $0.88 to $9.00 per million tokens across its open-model catalog. DeepSeek V3 is $1.25 / $1.25 per 1M, DeepSeek R1 is $3.00 / $7.00, Llama 3.3 70B is $0.88 / $0.88, and Llama 3.1 405B is $3.50 / $3.50. Qwen 2.5 72B and Mixtral 8x22B both sit at $1.20 / $1.20 per 1M. Together AI is a neutral open-model host with fine-tuning, dedicated deployments, and an OpenAI-compatible SDK.

Key facts about Together AI pricing

  • DeepSeek V3 flat-priced at $1.25 / $1.25 per 1M tokens with 128K context.
  • DeepSeek R1 reasoning model at $3.00 / $7.00 per 1M tokens.
  • Llama 3.3 70B at $0.88 / $0.88 per 1M tokens — 4x cheaper than GPT-5.4 on output.
  • Llama 3.1 405B flagship at $3.50 / $3.50 per 1M tokens.
  • Qwen 2.5 72B at $1.20 / $1.20 per 1M tokens — strong multilingual coverage.
  • Mixtral 8x22B sparse MoE at $1.20 / $1.20 per 1M tokens.
  • $5 free signup credit; pay-as-you-go beyond with no monthly minimums.
  • LoRA fine-tuning supported on all major Llama, Mistral, and Qwen sizes including 405B.

How much does each Together AI model cost per million tokens?

  • Llama 3.3 70B (Together)
    Llama 3.3
    together
    Input
    $0.88
    Cached
    Output
    $0.88
  • Qwen 2.5 72B (Together)
    Qwen
    together
    Input
    $1.20
    Cached
    Output
    $1.20
  • Mixtral 8x22B (Together)
    Mixtral
    together
    Input
    $1.20
    Cached
    Output
    $1.20
  • DeepSeek V3 (Together)
    DeepSeek
    together
    Input
    $1.25
    Cached
    Output
    $1.25
  • Mistral Large (Together)
    Mistral
    together
    Input
    $3.00
    Cached
    Output
    $9.00
  • Llama 3.1 405B (Together)
    Llama 3.1
    together
    Input
    $3.50
    Cached
    Output
    $3.50
Showing 6 of 7 models · USD per 1M tokens
Last synced:

Last synced:

Why choose Together AI over Groq or Fireworks?

Together AI's value is breadth, fine-tuning, and production features. The catalog covers Llama (all sizes), Mistral, Qwen, DeepSeek, plus long-tail specialty models that Groq and Fireworks don't carry. Pricing is competitive: Llama 3.3 70B at $0.88 / $0.88 per 1M matches Fireworks within a cent, and DeepSeek V3 at $1.25 / $1.25 is about 2x more than DeepSeek's own API but gives you a US-based host with consistent latency.

LoRA fine-tuning is the killer feature. Together AI supports fine-tuning on every major Llama, Mistral, and Qwen size including the 405B flagship, at roughly $8-$12 per million training tokens. Inference on fine-tuned adapters runs at standard rates plus a small LoRA overhead, which is dramatically cheaper than dedicated deployments. For teams iterating on custom behavior, this is the cleanest path.

Production features include dedicated deployments (guaranteed throughput + lower latency), BYOC options on AWS, and an OpenAI-compatible SDK so migration from OpenAI is usually a base-URL change.

When Together is not the right choice

Groq is 5-10x faster on the same Llama models — if latency is the critical path, Groq wins. Fireworks is slightly cheaper on Llama 3.1 405B. And for frontier reasoning and code quality, GPT-5.4 and Claude 3.5 still lead by a clear margin.

Price History

Track how Together AI API pricing has changed over time.

DeepSeek R1 (Together)

No history yet — first snapshot 2026-04-16. Price trends will appear here as data accumulates.

DeepSeek V3 (Together)

No history yet — first snapshot 2026-04-16. Price trends will appear here as data accumulates.

Llama 3.3 70B (Together)

No history yet — first snapshot 2026-04-16. Price trends will appear here as data accumulates.

Llama 3.1 405B (Together)

No history yet — first snapshot 2026-04-16. Price trends will appear here as data accumulates.

Qwen 2.5 72B (Together)

No history yet — first snapshot 2026-04-16. Price trends will appear here as data accumulates.

Mixtral 8x22B (Together)

No history yet — first snapshot 2026-04-16. Price trends will appear here as data accumulates.

Mistral Large (Together)

No history yet — first snapshot 2026-04-16. Price trends will appear here as data accumulates.

Price history tracking started April 2026. Charts will appear after the first price change is detected.
View pricing changelog →

Frequently asked questions

How much does Together AI charge per token?

Together AI pricing ranges from $0.88 per million input tokens for Llama 3.3 70B up to $9.00 per million output tokens for Mistral Large. DeepSeek V3 is flat-priced at $1.25 / $1.25 per 1M, Llama 3.1 405B at $3.50 / $3.50, and DeepSeek R1 at $3.00 / $7.00 per 1M. Together AI uses flat input/output pricing on most SKUs with no cached-input discount.

Does Together AI have a free tier?

Yes. Together AI offers $5 in free credits on signup (typically enough for several million tokens of prototyping on smaller Llama or Qwen models). Beyond that, pricing is pay-as-you-go per token with no monthly minimums. Dedicated deployments have their own pricing tier.

How does Together compare to Fireworks?

Together AI and Fireworks are the two main "neutral" open-model hosts and price within a few cents of each other on most SKUs. Llama 3.3 70B is $0.88 on Together vs $0.90 on Fireworks. Fireworks is slightly cheaper on Llama 3.1 405B ($3.00 vs $3.50). Together generally has a broader catalog of smaller and specialty models; Fireworks emphasizes inference speed and function calling.

What's Together's cheapest Llama option?

Llama 3.1 8B is Together AI's cheapest Llama at $0.18 per million input and output tokens. For the 70B tier, Llama 3.3 70B at $0.88 / $0.88 per 1M is the best cost/quality. Mixtral 8x22B at $1.20 / $1.20 is a common sparse MoE alternative if you want lower per-token cost at higher quality than dense 70B.

Does Together support fine-tuning?

Yes. Together AI supports LoRA fine-tuning on most Llama, Mistral, and Qwen models including Llama 3.1 405B. Typical training cost is around $8-$12 per million training tokens depending on model size, with hosted serving of the fine-tuned model at standard rates plus a small LoRA overhead.

What's DeepSeek R1's context window on Together?

DeepSeek R1 on Together AI supports a 64K token context window at $3.00 / $7.00 per 1M tokens. DeepSeek V3 ships with 128K context at $1.25 flat per 1M. If you need longer context, DeepSeek's own API sometimes ships extended context on preview endpoints cheaper than Together AI.

Methodology

Pricing sourced from https://api.together.ai/models on . All prices expressed in USD per 1 million tokens. We track 7 Together AI models spanning DeepSeek, Llama, Qwen, Mixtral, and Mistral catalogs.