Voice API pricing

AI Voice & TTS API Pricing

Compare developer pricing for text-to-speech, realtime voice, dubbing, translation, and speech APIs across ElevenLabs, Speechify, OpenAI, Google Cloud, and Amazon Polly. Last checked 2026-06-15.

Quick answer: for simple narration, Amazon Polly Standard and Google Standard/WaveNet are cheapest at $4 per 1M characters. Speechify lists $10 per 1M characters. ElevenLabs is priced for higher-end AI voice quality at $0.05-$0.10 per 1K characters. OpenAI is strongest when the job is realtime voice, translation, or transcription rather than static narration.

Character-based rates normalize to 1M input characters. They do not include free tiers, taxes, committed-use discounts, or enterprise contracts.
Realtime, dubbing, and transcription products are not directly comparable to static TTS. OpenAI and ElevenLabs publish minute- or token-based rates for live audio products.
Speech duration estimate uses English narration at roughly 700-900 characters per finished minute. Actual minutes vary by language, pacing, punctuation, and SSML.

Provider	Best fit	Published rate	Action
ElevenLabs API Per 1K characters for TTS; per minute or hour for other audio products	High-quality voice generation, cloning, dubbing, and realtime agents	$0.05-$0.10 per 1K TTS characters Flash / Turbo TTS: $0.05 per 1K characters Multilingual v2/v3 TTS: $0.10 per 1K characters Speech Engine agents: $0.08 per minute, burst at $0.16 per minute Dubbing: $0.33 per source minute with watermark, $0.50 without watermark	Try ElevenLabs
Speechify API Pay-as-you-go per character	Readable narration, accessibility, e-learning, and product voice features	$10 per 1M characters Public API page advertises $10 per 1M characters Free tier available for testing Enterprise and on-premise deployment available by quote	Try Speechify
OpenAI TTS / Realtime audio Audio tokens or per minute, depending on the audio product	Static TTS, realtime voice agents, speech translation, and streaming transcription	Realtime agents use audio tokens; translation starts at $0.034/min GPT-Realtime-2 audio: $32 input / $64 output per 1M audio tokens GPT-Realtime-2 cached audio input: $0.40 per 1M audio tokens GPT-Realtime-Translate: $0.034 per minute GPT-Realtime-Whisper: $0.017 per minute Text side of GPT-Realtime-2: $4 input / $24 output per 1M text tokens	View OpenAI pricing
Google Cloud Text-to-Speech Per character, with some newer speech generation priced by token	Cloud-native apps, large language coverage, and Google Cloud workloads	$4-$160 per 1M characters Standard and WaveNet voices: $4 per 1M characters Neural2 voices: $16 per 1M characters Chirp 3 HD voices: $30 per 1M characters Studio voices: $160 per 1M characters Instant custom voice: $60 per 1M characters Gemini TTS models are token-priced, not character-priced	View Google AI pricing
Amazon Polly Per 1M characters	AWS workloads, low-cost standard voices, and speech marks	$4-$100 per 1M characters Standard voices: $4 per 1M characters Neural voices: $16 per 1M characters Generative voices: $30 per 1M characters Long-Form voices: $100 per 1M characters	Open AWS Polly

Cost per 1M characters

Character-based APIs are easiest to compare directly. For a 1,000,000 character TTS workload, the published list prices look like this before free tiers, taxes, discounts, enterprise commits, or token-priced realtime audio products.

Speechify API

Published API rate

$10

ElevenLabs

Flash / Turbo

$50

ElevenLabs

Multilingual v2/v3

$100

Google Cloud TTS

Standard / WaveNet

Google Cloud TTS

Neural2

$16

Google Cloud TTS

Chirp 3 HD

$30

Google Cloud TTS

Studio

$160

Amazon Polly

Standard

Amazon Polly

Neural

$16

Amazon Polly

Generative

$30

Amazon Polly

Long-Form

$100

Per-minute voice rates

Live voice, translation, transcription, and dubbing products often price by audio minute instead of input characters. These rates are easier to compare by minute and by hour.

Provider	Product	Per minute	Per hour
OpenAI	GPT-Realtime-Whisper transcription	$0.017	$1
OpenAI	GPT-Realtime-Translate	$0.034	$2
ElevenLabs	Speech Engine agents	$0.08	$5
ElevenLabs	Speech Engine burst	$0.16	$10
ElevenLabs	Dubbing with watermark	$0.33	$20
ElevenLabs	Dubbing without watermark	$0.50	$30

How to choose

Pick Google Cloud Text-to-Speech or Amazon Polly when the main requirement is cheap, reliable narration at scale. Pick Speechify when you want a simple published API rate and a voice product tuned for readable narration. Pick ElevenLabs when voice quality, voice cloning, dubbing, agent voice, and expressiveness matter more than the absolute lowest character price.

OpenAI is a different pricing shape. Static TTS belongs with generated audio output, while GPT-Realtime-2 is a live multimodal model with audio token pricing. GPT-Realtime-Translate and GPT-Realtime-Whisper publish per-minute rates. Use OpenAI when the product is a realtime voice interface, live translation layer, or streaming transcription workflow, not just a batch TTS job.

For text-token model costs, use the AI API pricing table and token cost calculator. For voice workloads, model characters, audio minutes, concurrency, caching rights, and whether you need cloning or speech marks.

FAQ

Which TTS API is cheapest?

For basic cloud TTS, Google Cloud Standard/WaveNet and Amazon Polly Standard are both $4 per 1M characters. Speechify lists $10 per 1M characters. ElevenLabs starts higher at $50 per 1M characters for Flash/Turbo, but targets more expressive AI voice output.

Why is ElevenLabs more expensive than Polly or Google Standard voices?

ElevenLabs is optimized for expressive, low-latency AI voice generation, voice cloning, dubbing, and agent voice. Polly and Google Standard are cheaper for straightforward narration at scale.

How do character prices map to minutes of audio?

A rough English narration estimate is 700-900 characters per minute, depending on punctuation, speed, and language. One million characters often lands near 18-24 hours of finished speech.

Should voice agents use per-character TTS or realtime audio pricing?

If the app is turn-based narration, per-character TTS is easier to forecast. If users interrupt, talk over the system, or need live translation/transcription, realtime per-minute or audio-token pricing is the better model.

Cost per 1M characters

Per-minute voice rates

How to choose

FAQ

Which TTS API is cheapest?

Why is ElevenLabs more expensive than Polly or Google Standard voices?

How do character prices map to minutes of audio?

Should voice agents use per-character TTS or realtime audio pricing?

Sources