Xiaomi MiMo Price Cut: Pricing Impact (May 2026)

Xiaomi has permanently cut API pricing for the MiMo-v2.5 model series, with the largest published reduction reaching roughly 99% on cache-hit input pricing. The change takes effect globally at 00:00 on May 27, 2026 Beijing time.

The headline number is aggressive, but the practical story is simple: Xiaomi is trying to make MiMo-v2.5 a serious low-cost API option, not just a promotional model launch. Overseas pay-as-you-go pricing now lists mimo-v2.5 at $0.14 per 1 million cache-miss input tokens and $0.28 per 1 million output tokens. mimo-v2.5-pro lists at $0.435 input and $0.87 output.

For current mainstream alternatives, keep our DeepSeek pricing, OpenAI pricing, and AI token cost calculator open while you model your own workload. This cut puts Xiaomi much closer to the budget API conversation covered in our DeepSeek vs OpenAI pricing comparison.

What changed

Xiaomi announced four related changes:

Permanent API price reductions for the MiMo-v2.5 series.
No more input-length price split for MiMo-v2.5 pricing.
Token Plan quota economics improved by 5x to 8x without increasing plan prices.
Existing active Token Plan user quotas reset under the new billing rules.

The company also said its 100 trillion token creator incentive program has ended after all tokens were distributed ahead of schedule. That matters because this is not merely the end of a free-token promotion. Xiaomi is replacing the promotion with lower standing API prices.

New overseas MiMo API pricing

All prices below are USD per 1 million tokens from Xiaomi’s overseas pricing page.

Model	Cache-hit input	Cache-miss input	Output	Notes
mimo-v2.5	$0.0028	$0.14	$0.28	1M context, multimodal understanding, deep thinking, tool use
mimo-v2.5-pro	$0.0036	$0.435	$0.87	1M context, deep thinking, structured output, web search

The new rates are especially notable on cached input. A cache-hit rate of $0.0028/M on mimo-v2.5 is effectively negligible for repeated system prompts, tool schemas, policy text, and static retrieval context.

Xiaomi also says MiMo-v2.5 removes the old input-length pricing distinction. Under the older MiMo-v2 Pro table, prices changed above 256K input tokens. For long-context applications, removing that split makes budgeting easier and lowers the risk of accidentally moving into a more expensive band.

Old vs new pricing impact

The cleanest comparison is not perfect because the model names changed, but the direction is clear. The older overseas MiMo-v2 Pro pricing reached $2/M input and $6/M output for the 256K to 1M input band. The new MiMo-v2.5 Pro rate is $0.435/M input and $0.87/M output.

Comparison	Old overseas rate	New overseas rate	Approx. reduction
Pro cache-hit input, long-context band	$0.40/M	$0.0036/M	99.1%
Pro cache-miss input, long-context band	$2.00/M	$0.435/M	78.3%
Pro output, long-context band	$6.00/M	$0.87/M	85.5%
Omni-style cache-hit input	$0.08/M	$0.0028/M	96.5%
Omni-style cache-miss input	$0.40/M	$0.14/M	65.0%
Omni-style output	$2.00/M	$0.28/M	86.0%

That explains the “up to 99%” claim: the maximum reduction is on cached input, not every token category. For most buyers, output pricing and cache-miss input pricing will matter more than the headline maximum. Those cuts are still large enough to change routing decisions.

How Xiaomi compares with budget APIs

Using the tracked rates in our pricing data, MiMo-v2.5 now lands in the same raw token-price neighborhood as the cheapest serious text models.

Model	Input	Cached input	Output	Practical read
Xiaomi mimo-v2.5	$0.14	$0.0028	$0.28	New budget MiMo baseline
Xiaomi mimo-v2.5-pro	$0.435	$0.0036	$0.87	Low-cost reasoning and long-context option
DeepSeek V4 Flash	$0.14	$0.0028	$0.28	Current budget benchmark in our data
Gemini 2.5 Flash	$0.30	n/a	$2.50	More expensive output, broader Google ecosystem
GPT-5.4 mini	$0.75	$0.075	$4.50	Mature OpenAI small-model route
Claude Haiku 4.5	$1.00	$0.10	$5.00	Anthropic low-cost Claude tier

The striking point is that Xiaomi’s base MiMo-v2.5 overseas rate matches DeepSeek V4 Flash in our current pricing data: $0.14/M input, $0.0028/M cached input, and $0.28/M output.

That does not mean the models are interchangeable. Developers still need to test quality, latency, tool-use reliability, output formatting, regional availability, terms, support, and data-handling requirements. But on raw token price, Xiaomi is no longer priced like a niche long-context provider.

Why Xiaomi says costs fell

Xiaomi attributes the cut to inference-system improvements rather than a short-lived marketing discount.

The company says it now supports Sliding Window Attention based on SGLang HiCache, reducing KV cache data transfer across GPU memory, CPU memory, and SSD to nearly one seventh of the previous amount. It also says cacheable tokens increased to nearly 5x the previous level, improving cache-hit rates and inference efficiency.

Xiaomi also points to expert parallelism and input-length bucketing optimizations. In plain English: the provider says it can serve long prompts and repeated context more efficiently, so it is passing some of that cost reduction into permanent API prices.

That is the part buyers should watch. If the underlying serving improvements hold under real traffic, MiMo-v2.5 could be especially attractive for agent, coding, document analysis, and retrieval-heavy workloads where context repeats often.

Who benefits

High-volume developers are the obvious winners. A workload with 100 million input tokens and 40 million output tokens would cost about $25.20 on mimo-v2.5 before web search or other extras. The same token mix on GPT-5.4 mini would cost about $255, and on Gemini 2.5 Flash about $130.

Long-context teams also benefit because the new pricing no longer separates shorter and longer input bands for MiMo-v2.5. If your app moves between short chats, large documents, and 1M-token context windows, simpler pricing reduces budgeting surprises.

Existing Token Plan users get a second benefit: Xiaomi says active quotas will be fully reset at the effective time and then governed by the new billing rules. Teams that had already consumed quota should check their dashboard after the reset.

Who should be careful

Do not migrate only because the table is cheap. Xiaomi MiMo is a newer API platform for many Western buyers, and procurement risk is not captured in a token price.

Teams should verify API compatibility, rate limits, uptime, support expectations, model behavior, and data-governance requirements before shifting customer-facing traffic. The model page lists 100 RPM and 10M TPM for the v2.5 Pro and TTS entries, so high-concurrency workloads still need capacity planning and retry logic.

Also separate token pricing from web search pricing. Xiaomi lists overseas internet connectivity service at $5 per 1,000 calls. If your workflow uses web search heavily, that can become a meaningful line item.

What to do now

If you already tested MiMo-v2 or used the creator incentive program, rerun your cost model using the new MiMo-v2.5 rates. Pay special attention to output-heavy tasks, long-context tasks, and cacheable agent context.

If you are comparing budget APIs, benchmark MiMo-v2.5 directly against DeepSeek V4 Flash, Gemini 2.5 Flash, GPT-5.4 mini, and your current default. Use accepted-answer cost, not just token cost. A cheaper model loses its advantage if it needs more retries or heavier prompt scaffolding.

If your workload repeats large static context, test prompt caching early. Xiaomi’s cache-hit rates are the biggest part of the price story, and the announced HiCache work suggests the provider wants to compete hard on repeated-context economics.

My read

This is one of the sharper AI API pricing moves of May 2026. Xiaomi is moving MiMo-v2.5 into the low-cost frontier conversation with rates that directly challenge DeepSeek-style budget pricing while still advertising 1M context and advanced tool capabilities.

The right response is not an immediate production migration. It is a fast benchmark. If MiMo-v2.5 passes your quality and reliability bar, the new price table is low enough to justify adding Xiaomi to routing tests for summarization, coding assistants, research agents, support drafts, and long-context document workflows.

Sources: Xiaomi MiMo-v2.5 price adjustment announcement, Xiaomi MiMo API pricing, and Xiaomi MiMo model limits.