guide

Google Gemini API Pricing Guide 2026

Google Gemini API pricing guide for 2026: Gemini 3.1 Pro, 3 Flash, 2.5 Pro, Flash-Lite, cache discounts, free tier notes, and cost tips.

By AI Pricing Guru Editorial Team

AI Pricing Guru articles are maintained by the editorial workflow behind the site: daily pricing snapshots, provider source checks, and review passes for model launches, subscription limits, and billing changes.

Provider pricing pages are not written in the same shape. I keep this guide focused on the numbers a buyer has to normalize before comparing OpenAI, Anthropic, Google, DeepSeek, and the rest side by side.

Google Gemini API pricing in 2026 is one of the more attractive options for teams that want large context windows, multimodal support, and lower premium token rates than OpenAI’s newest flagship models. The current tracked Gemini stack ranges from Gemini 3.1 Flash-Lite at $0.25 input / $1.50 output per million tokens to Gemini 3 Pro and Gemini 3.1 Pro at $2 input / $12 output.

The catch is that Google pricing is easy to misread. Gemini has several active and preview tiers, Pro models have changed how free access works, and long-context workloads can behave very differently from short chat prompts. A team that only looks at the cheapest Flash-Lite price may under-budget for premium output. A team that only looks at Gemini 3 Pro may miss the fact that Gemini 2.5 Pro is still a very cost-effective production option when it passes evals.

This guide breaks down the current Gemini API price ladder, where hidden costs appear, and which Gemini model to start with for common workloads. For live data, use our Google AI pricing page, the full AI API pricing table, and the token cost calculator. For a direct head-to-head, see Gemini vs GPT-5.4 pricing.

Gemini API Pricing: Quick Reference

All prices below are in USD per 1 million tokens, based on the current pricing data tracked by AI Pricing Guru.

Gemini modelStatusInputCached inputOutputBest fit
Gemini 3.1 ProPreview$2.00$0.20$12.00Premium reasoning, multimodal analysis, long-context apps
Gemini 3 ProPreview$2.00$0.20$12.00Newer premium Google model, GPT-5.4 alternative
Gemini 3 FlashPreview$0.50$0.05$3.00Fast user-facing apps, support, extraction, RAG
Gemini 3.1 Flash-LitePreview$0.25$0.025$1.50Cheapest Gemini 3.x tier, high-volume utility calls
Gemini 2.5 ProActive$1.25$0.125$10.00Lower-cost premium work when quality is sufficient
Gemini 2.5 Pro (>200k tokens)Active$2.50$0.25$15.00Large-context Gemini 2.5 Pro workloads
Gemini 2.5 FlashActive$0.30$0.03$2.50Mature fast tier, budget production apps

The headline takeaways:

  • Gemini 3 Pro and Gemini 3.1 Pro are cheaper than GPT-5.4 on raw token price in the current tracked table: $2/$12 versus GPT-5.4 at $2.50/$15.
  • Gemini 2.5 Pro is still a strong value model at $1.25/$10, as long as your prompts don’t push into the higher long-context tier.
  • Gemini Flash and Flash-Lite are where Google gets very cost-competitive for high-volume apps.
  • Cached input is 90% cheaper across the current tracked Gemini models, which is a major lever for RAG, agents, and repeated system prompts.

Which Gemini model should you use?

Use Gemini 3.1 Flash-Lite for high-volume utility work

Gemini 3.1 Flash-Lite is the cheapest current Gemini 3.x model in our tracker at $0.25 per million input tokens, $0.025 per million cached input tokens, and $1.50 per million output tokens.

That makes it a strong starting point for narrow, repeated jobs:

  • classification and tagging
  • intent routing
  • language detection
  • basic extraction to JSON
  • short summaries
  • search result cleanup
  • moderation pre-checks
  • lightweight RAG answers where quality requirements are modest

Flash-Lite isn’t the model to pick for your hardest reasoning tasks. Its job is to make the cheap layer cheap enough that you don’t waste Pro tokens on infrastructure calls. If a request can be verified mechanically or escalated when confidence is low, start with Flash-Lite.

Use Gemini 3 Flash for fast production defaults

Gemini 3 Flash costs $0.50 input / $3.00 output per million tokens, with cached input at $0.05. It’s twice the input price of Flash-Lite and twice the output price, but still far below the Pro tiers.

Use Gemini 3 Flash when the user sees the answer and you need a better quality/latency balance than Flash-Lite:

  • customer support drafts
  • product Q&A
  • ecommerce assistants
  • multimodal app flows
  • extraction plus explanation
  • internal copilots
  • routine RAG over clean documentation

For many products, Flash is the default route and Pro is the escalation route. That architecture keeps response quality acceptable while avoiding a premium bill on every request.

Gemini 2.5 Flash remains active at $0.30 input / $2.50 output. It can still be economical if your evals already pass on 2.5 Flash or if you want a mature, lower-cost Flash tier. New builds should benchmark both 2.5 Flash and 3 Flash because the better choice depends on quality, latency, and retry rate.

Use Gemini 2.5 Pro when you want premium quality at a discount

Gemini 2.5 Pro is easy to overlook because Gemini 3.x gets more attention. That would be a mistake. At $1.25 input / $10 output, Gemini 2.5 Pro is one of the better premium-value models in the current market.

It’s especially attractive for:

  • document Q&A under normal context sizes
  • research synthesis
  • business analysis
  • long-form summarization
  • higher-quality RAG answers
  • coding assistance where Gemini passes your evals
  • multimodal tasks that don’t require the newest preview model

The main caveat is long context. The tracked table includes a higher Gemini 2.5 Pro tier for prompts above 200K tokens at $2.50 input / $15 output. That’s still useful, but it changes the comparison: the long-context 2.5 Pro tier is no longer a discount versus GPT-5.4.

So the rule is simple: Gemini 2.5 Pro is a bargain when your prompts stay in the lower tier. For very large prompts, model the cost before assuming it’s cheaper.

Use Gemini 3 Pro or 3.1 Pro for premium Google workloads

Gemini 3 Pro and Gemini 3.1 Pro both sit at $2.00 input / $12.00 output, with cached input at $0.20 in the current tracker. These are the models to test when you want Google’s premium generation and reasoning behavior.

Use them for:

  • high-value customer-facing answers
  • complex multimodal reasoning
  • long document analysis
  • difficult RAG synthesis
  • executive research workflows
  • agent tasks that fail on Flash
  • provider diversification away from OpenAI or Anthropic

The most direct pricing comparison is OpenAI GPT-5.4. GPT-5.4 costs $2.50 input / $15 output, so Gemini 3 Pro is about 20% cheaper on both sides of the token bill. Against Anthropic Claude Opus 4.7, Gemini 3 Pro is much cheaper on listed token price, although Claude may still win specific coding, writing, or agentic evals.

Hidden Gemini API costs to watch

1. Output tokens can dominate the bill

Gemini’s output prices are lower than many competing premium models, but output still matters. A model that costs $12 per million output tokens can get expensive if every request produces long answers, markdown tables, code blocks, or verbose explanations.

Consider a monthly workload with 50 million input tokens and 20 million output tokens:

ModelInput costOutput costTotal
Gemini 3.1 Flash-Lite$12.50$30.00$42.50
Gemini 3 Flash$25.00$60.00$85.00
Gemini 2.5 Pro$62.50$200.00$262.50
Gemini 3 Pro$100.00$240.00$340.00

The output side is larger than the input side for every model in that example. Keep responses concise, avoid unnecessary chain-of-thought-style explanations, and don’t ask a Pro model to generate bulk text if Flash passes quality checks.

2. Cached input is one of Google’s biggest savings levers

Gemini cached input rates are roughly 90% lower than normal input rates in the tracked data:

ModelNormal inputCached inputSavings
Gemini 3.1 Flash-Lite$0.25/M$0.025/M90%
Gemini 3 Flash$0.50/M$0.05/M90%
Gemini 2.5 Pro$1.25/M$0.125/M90%
Gemini 3 Pro$2.00/M$0.20/M90%

Caching matters when you repeatedly send the same system prompt, policy document, style guide, tool schema, repository context, or knowledge base material. It’s especially valuable for RAG apps, coding agents, compliance assistants, support bots, and any app with large stable prefixes.

For the broader mechanics, read Cached Tokens Explained: Save 50-90% on AI Costs.

3. Long-context pricing changes the model choice

Gemini is famous for long context, but long context isn’t free. The clearest example in the current tracker is Gemini 2.5 Pro: the normal tier is $1.25/$10, while the >200K-token tier rises to $2.50/$15.

That means a workload with 100 million large-context input tokens and 20 million output tokens costs:

ModelEstimated monthly token cost
Gemini 2.5 Pro normal tier$325
Gemini 2.5 Pro >200K tier$550
Gemini 3 Pro$440

If you need large prompts often, don’t assume the cheapest Pro model stays cheapest. Split documents, retrieve only relevant chunks, cache stable context, and use the token calculator with realistic prompt sizes before launch.

4. Free tier access is no longer a planning substitute

Google’s free tier has been useful for prototyping, but production planning should use paid token rates. In April 2026, Google removed free Gemini Pro API access, while Flash-family free access remained more relevant for lightweight testing. We covered the change in Google Ends Free Gemini Pro API Access.

The practical advice: use the free tier for experiments, not budget forecasts. If a product depends on Gemini Pro quality, model it with paid rates from day one.

Best Gemini model by use case

Use caseRecommended starting modelWhy
Intent routing and taggingGemini 3.1 Flash-LiteLowest Gemini 3.x cost, easy to verify
Customer support chatbotGemini 3 FlashBetter quality than Lite without Pro pricing
High-volume RAG over clean docsGemini 3 Flash or 2.5 FlashGood cost/latency balance
Premium RAG and researchGemini 2.5 Pro, then Gemini 3 ProStart with lower premium price, escalate if needed
Long-context document reviewGemini 3 Pro or modeled 2.5 Pro tierLong-context economics can change the winner
Multimodal applicationGemini 3 Flash for default, Gemini 3 Pro for hard casesRoute by difficulty and user value
Bulk extractionGemini 3.1 Flash-LiteKeep outputs structured and short
Provider diversificationGemini 3 Pro / 3.1 ProStrong OpenAI alternative with lower listed token rates

The most cost-effective Gemini setup is usually a router:

  1. Flash-Lite for cheap utility work.
  2. Flash for default user-visible responses.
  3. 2.5 Pro for premium work where it passes evals.
  4. 3 Pro or 3.1 Pro for hard, high-value, or newer-model tasks.
  5. Caching wherever stable context repeats.

Gemini vs OpenAI, Claude, and DeepSeek

Compared with OpenAI pricing, Gemini’s strongest advantage is premium token price. Gemini 3 Pro at $2/$12 undercuts GPT-5.4 at $2.50/$15, and Gemini 2.5 Pro can be cheaper still when the normal context tier applies. OpenAI still has a broader model ladder, strong developer tooling, and very cheap nano/mini options, so the winner depends on evals and integration cost.

Compared with Anthropic pricing, Gemini is usually cheaper on raw tokens. Claude’s advantage isn’t the spreadsheet price; it’s task quality in certain coding, writing, reasoning, and agent workflows. If Claude succeeds where Gemini needs retries, Claude can still be cheaper per accepted answer.

Compared with DeepSeek pricing, Gemini isn’t the cheapest. DeepSeek can win raw token-cost comparisons, especially for budget workloads. Gemini’s advantage is Google’s infrastructure, multimodal stack, enterprise procurement, Vertex AI fit, and a stronger Tier-1 provider story.

The right comparison isn’t just cost per token. It’s cost per successful task.

Practical cost-saving tips

  1. Don’t default everything to Pro. Start with Flash or Flash-Lite, then escalate based on confidence or task type.
  2. Benchmark Gemini 2.5 Pro before jumping to 3 Pro. If quality is close, 2.5 Pro may save meaningful money.
  3. Watch the >200K-token threshold. Long context can erase 2.5 Pro’s normal price advantage.
  4. Cache stable context. A 90% cached-input discount can matter more than switching providers.
  5. Cap output length. Output is usually the largest line item in chat, support, coding, and report generation.
  6. Use batch or offline processing where latency doesn’t matter. Don’t pay interactive economics for back-office workloads.
  7. Run evals with real prompts. Gemini often looks cheaper on paper, but retry rate and prompt scaffolding decide the true bill.

FAQ

What is the cheapest Gemini API model?

In the current tracked data, Gemini 3.1 Flash-Lite is the cheapest Gemini model at $0.25 per million input tokens, $0.025 per million cached input tokens, and $1.50 per million output tokens.

How much does Gemini 3 Pro cost?

Gemini 3 Pro costs $2.00 per million input tokens, $0.20 per million cached input tokens, and $12.00 per million output tokens in the current AI Pricing Guru tracker.

Is Gemini cheaper than GPT-5.4?

On listed token price, yes for the current premium comparison. Gemini 3 Pro is $2/$12, while GPT-5.4 is $2.50/$15. Gemini 2.5 Pro can be cheaper at $1.25/$10, unless your workload triggers the higher long-context tier.

Does Gemini have cached input discounts?

Yes. The tracked Gemini models show cached input rates roughly 90% lower than normal input rates. That’s valuable for repeated system prompts, tool schemas, documentation, RAG context, and agent memory.

Which Gemini model should I start with?

Start with Gemini 3 Flash for user-visible production work, Gemini 3.1 Flash-Lite for cheap utility calls, and Gemini 2.5 Pro for premium work where you want a lower-cost alternative to Gemini 3 Pro. Escalate to Gemini 3 Pro or 3.1 Pro when quality justifies the higher rate.

Bottom Line

Google Gemini API pricing rewards teams that route intelligently. Gemini 3.1 Flash-Lite is the cheap utility layer, Gemini 3 Flash is the practical default, Gemini 2.5 Pro is the premium-value tier, and Gemini 3 Pro / 3.1 Pro are the high-end Google options.

If your app is provider-neutral, Gemini deserves serious testing against OpenAI and Claude. The token prices are competitive, cached input discounts are strong, and the Flash tiers can dramatically reduce high-volume costs. Just model real prompts, especially long-context prompts, before assuming the lowest headline price will be your production price.