Voice API pricing
AI Voice & TTS API Pricing
Compare developer pricing for text-to-speech, realtime voice, dubbing, translation, and speech APIs across ElevenLabs, Speechify, OpenAI, Google Cloud, and Amazon Polly. Last checked .
Quick answer: for simple narration, Amazon Polly Standard and Google Standard/WaveNet are cheapest at $4 per 1M characters. Speechify lists $10 per 1M characters. ElevenLabs is priced for higher-end AI voice quality at $0.05-$0.10 per 1K characters. OpenAI is strongest when the job is realtime voice, translation, or transcription rather than static narration.
- Character-based rates normalize to 1M input characters. They do not include free tiers, taxes, committed-use discounts, or enterprise contracts.
- Realtime, dubbing, and transcription products are not directly comparable to static TTS. OpenAI and ElevenLabs publish minute- or token-based rates for live audio products.
- Speech duration estimate uses English narration at roughly 700-900 characters per finished minute. Actual minutes vary by language, pacing, punctuation, and SSML.
| Provider | Best fit | Published rate | Action |
|---|---|---|---|
| ElevenLabs API Per 1K characters for TTS; per minute or hour for other audio products | High-quality voice generation, cloning, dubbing, and realtime agents | $0.05-$0.10 per 1K TTS characters
| Try ElevenLabs |
| Speechify API Pay-as-you-go per character | Readable narration, accessibility, e-learning, and product voice features | $10 per 1M characters
| Try Speechify |
| OpenAI TTS / Realtime audio Audio tokens or per minute, depending on the audio product | Static TTS, realtime voice agents, speech translation, and streaming transcription | Realtime agents use audio tokens; translation starts at $0.034/min
| View OpenAI pricing |
| Google Cloud Text-to-Speech Per character, with some newer speech generation priced by token | Cloud-native apps, large language coverage, and Google Cloud workloads | $4-$160 per 1M characters
| View Google AI pricing |
| Amazon Polly Per 1M characters | AWS workloads, low-cost standard voices, and speech marks | $4-$100 per 1M characters
| Open AWS Polly |
Cost per 1M characters
Character-based APIs are easiest to compare directly. For a 1,000,000 character TTS workload, the published list prices look like this before free tiers, taxes, discounts, enterprise commits, or token-priced realtime audio products.
Per-minute voice rates
Live voice, translation, transcription, and dubbing products often price by audio minute instead of input characters. These rates are easier to compare by minute and by hour.
| Provider | Product | Per minute | Per hour |
|---|---|---|---|
| OpenAI | GPT-Realtime-Whisper transcription | $0.017 | $1 |
| OpenAI | GPT-Realtime-Translate | $0.034 | $2 |
| ElevenLabs | Speech Engine agents | $0.08 | $5 |
| ElevenLabs | Speech Engine burst | $0.16 | $10 |
| ElevenLabs | Dubbing with watermark | $0.33 | $20 |
| ElevenLabs | Dubbing without watermark | $0.50 | $30 |
How to choose
Pick Google Cloud Text-to-Speech or Amazon Polly when the main requirement is cheap, reliable narration at scale. Pick Speechify when you want a simple published API rate and a voice product tuned for readable narration. Pick ElevenLabs when voice quality, voice cloning, dubbing, agent voice, and expressiveness matter more than the absolute lowest character price.
OpenAI is a different pricing shape. Static TTS belongs with generated audio output, while GPT-Realtime-2 is a live multimodal model with audio token pricing. GPT-Realtime-Translate and GPT-Realtime-Whisper publish per-minute rates. Use OpenAI when the product is a realtime voice interface, live translation layer, or streaming transcription workflow, not just a batch TTS job.
For text-token model costs, use the AI API pricing table and token cost calculator. For voice workloads, model characters, audio minutes, concurrency, caching rights, and whether you need cloning or speech marks.
FAQ
Which TTS API is cheapest?
For basic cloud TTS, Google Cloud Standard/WaveNet and Amazon Polly Standard are both $4 per 1M characters. Speechify lists $10 per 1M characters. ElevenLabs starts higher at $50 per 1M characters for Flash/Turbo, but targets more expressive AI voice output.
Why is ElevenLabs more expensive than Polly or Google Standard voices?
ElevenLabs is optimized for expressive, low-latency AI voice generation, voice cloning, dubbing, and agent voice. Polly and Google Standard are cheaper for straightforward narration at scale.
How do character prices map to minutes of audio?
A rough English narration estimate is 700-900 characters per minute, depending on punctuation, speed, and language. One million characters often lands near 18-24 hours of finished speech.
Should voice agents use per-character TTS or realtime audio pricing?
If the app is turn-based narration, per-character TTS is easier to forecast. If users interrupt, talk over the system, or need live translation/transcription, realtime per-minute or audio-token pricing is the better model.