Cohere API Pricing 2026: Command R+, Command R, R7B, Embed v3, Rerank v3

Last updated:

How much does Cohere cost? Cohere API pricing spans $0.0375 to $10.00 per million tokens. Command R+ 08-2024 flagship is $2.50 / $10.00 per 1M, Command R 08-2024 is $0.15 / $0.60, and Command R7B is just $0.0375 / $0.15 per 1M. Embed v3 (English and Multilingual) is $0.10 per 1M input tokens, and Rerank v3 is priced at $2.00 per 1M tokens of search input. Cohere's focus is enterprise RAG — all Command models ship with 128K context and strong long-document recall.

Key facts about Cohere pricing

  • Command R+ 08-2024 flagship at $2.50 / $10.00 per 1M tokens — matches GPT-5.4 on input, 33% cheaper on output.
  • Command R 08-2024 at $0.15 / $0.60 per 1M tokens, RAG-optimized mid-tier.
  • Command R7B at $0.0375 / $0.15 per 1M tokens — one of the cheapest production APIs available.
  • Embed v3 English and Embed v3 Multilingual at $0.10 per 1M input tokens; multilingual covers 100+ languages.
  • Rerank v3 at $2.00 per 1M tokens — unique neural reranker for boosting vector-search precision.
  • All Command models ship with a 128K token context window tuned for long-document RAG.
  • Free developer tier with generous rate limits; no monthly minimums on paid tiers.
  • Enterprise deployments available on AWS, Azure, Oracle, and self-hosted (Cohere for AI).

How much does each Cohere model cost per million tokens?

  • Command R7B
    Command R
    cohere
    Input
    $0.0375
    Cached
    Output
    $0.15
  • Embed v3 Multilingual
    Embed
    cohere
    Input
    $0.10
    Cached
    Output
    $0.00
  • Rerank v3
    Rerank
    cohere
    Input
    $2.00
    Cached
    Output
    $0.00
  • Command A
    Command A
    cohere
    Input
    $2.50
    Cached
    Output
    $10.00
Showing 4 of 7 models · USD per 1M tokens
Last synced:

Last synced:

Why choose Cohere over OpenAI or Anthropic?

Cohere's positioning is narrower than OpenAI or Anthropic: enterprise retrieval-augmented generation. The Command, Embed, and Rerank stack is purpose-built to work together — Embed v3 feeds a vector store, Rerank v3 boosts top-K precision, and Command R/R+ generate grounded answers with inline citations. At $0.10 / 1M for embeddings, $2 / 1M for reranking, and $2.50 / $10 per 1M for flagship generation, the end-to-end RAG cost is often lower than equivalent GPT-5.4 + OpenAI embeddings pipelines.

Command R7B at $0.0375 / $0.15 per 1M tokens is genuinely one of the cheapest production APIs available in May 2026 — 4x cheaper than GPT-5.4 nano on input. For high-volume classification, routing, or lightweight summarization it's a strong default. Cohere also offers enterprise deployments on AWS, Azure, and Oracle with BYOK and private endpoints, which matters in regulated industries.

When Cohere is not the right choice

Command R+ trails GPT-5.4 and Claude 3.5 Sonnet on frontier reasoning, code generation, and tool use. If your workload is agentic or code-heavy, OpenAI or Anthropic are still the better default. Cohere also doesn't ship a vision model in the Command family, so multimodal workloads need a different provider.

Price History

Track how Cohere API pricing has changed over time.

Command R 08-2024

Command R+ 08-2024

Command R7B

Embed v3 English

Embed v3 Multilingual

Rerank v3

Command A

Price history tracking started April 2026. Charts will appear after the first price change is detected.
View pricing changelog →

Frequently asked questions

How much does Cohere Command R+ cost?

Command R+ 08-2024 is priced at $2.50 per million input tokens and $10.00 per million output tokens. That is roughly the same as GPT-5.4 on input and 33% cheaper on output, positioning Command R+ as a competitive flagship option — especially for retrieval-augmented generation where Cohere is purpose-built.

Does Cohere have a free tier?

Yes. Cohere offers a generous free trial tier with rate-limited access to Command, Embed, and Rerank endpoints for prototyping. Production pay-as-you-go pricing kicks in once you exceed trial limits or request production keys — there are no monthly minimums.

How does Cohere compare to OpenAI?

Cohere is narrower in focus than OpenAI but often better at what it does. Command R+ at $2.50 / $10.00 per 1M tokens matches GPT-5.4 on input cost and undercuts by 33% on output. Embed v3 at $0.10 per 1M tokens is 33% cheaper than OpenAI text-embedding-3-large ($0.13). Rerank v3 is unique — OpenAI does not ship a dedicated reranker.

What are Cohere's embedding model prices?

Embed v3 English and Embed v3 Multilingual are both priced at $0.10 per million input tokens. The multilingual variant supports over 100 languages with near-English quality. Embeddings are input-only, so there is no output token cost.

What's Rerank v3 priced at?

Rerank v3 costs $2.00 per million tokens of search queries + documents combined. Reranking is typically used after vector retrieval to boost top-K precision — you send 100-1000 candidate documents per query and Rerank scores them. At $2 / 1M, a typical 10-query / 100-doc-each workload costs about $0.02.

What's Command R+'s context window?

Command R+ 08-2024 supports a 128K token context window, matching GPT-5.4. Command R 08-2024 also ships with 128K context. Cohere tunes these models aggressively for long-document RAG, with strong needle-in-a-haystack recall at depths up to 100K tokens.

What are Cohere's API rate limits for the free trial and production tiers?

Trial (evaluation) keys are capped at 1,000 API calls per month total, with per-endpoint per-minute limits: Chat (Command A, R+, R, R7B) 20 req/min, Rerank 10 req/min, Embed 2,000 inputs/min, EmbedJob 5 req/min, Audio Transcriptions 5 req/min. Production keys lift Chat to 500 req/min, Rerank to 1,000 req/min, EmbedJob to 50 req/min, and remove the monthly cap. Newer models like Command A Reasoning, Command A Translate, and Command A Vision require contacting <a href="mailto:sales@cohere.com">sales@cohere.com</a> for production access — trial keys work for evaluation at 20 req/min.

When was Command R+ last updated and is it still the flagship?

The current Command R+ generation is the 08-2024 release (Command R+ 08-2024 at $2.50/$10 per 1M tokens), still actively served. Cohere has since shipped Command A as a higher-end variant with reasoning, vision, and translation specializations — those are gated to evaluation use until you contact sales for production keys. For general-purpose RAG and chat workloads, Command R+ 08-2024 remains the flagship priced for self-serve production use.

What's changed in Cohere's pricing and lineup in 2026?

The active 2026 Cohere lineup on self-serve production keys is Command R+ ($2.50/$10), Command R ($0.15/$0.60), Command R7B ($0.0375/$0.15 — Cohere's cheapest chat model), Embed v3 English and Multilingual ($0.10 input-only), and Rerank v3 ($2.00 per 1M tokens of query + documents). Newer Command A variants (Reasoning, Translate, Vision) are evaluation-only at the moment. Audio Transcriptions joined the rate-limit table in early 2026 at 5 req/min on trial keys. Prices on Command R+/R/R7B have held steady since the 08-2024 release — Cohere's competitive lever has been adding capabilities (Command A reasoning, image inputs, audio) rather than cutting per-token prices.

Methodology

Pricing sourced from https://cohere.com/pricing on . All prices expressed in USD per 1 million tokens. We track 7 Cohere models covering Command (generation), Embed (vectors), and Rerank (neural reranking) endpoints.