TTS: languages, voices & MP3

Languages and voices

WebVoice groups voices by language so you can pick the locale that matches your content — from widely used languages to several regional accents and gender options where the underlying engine exposes them. You are not limited to a single default voice: switch per project or per sentence when experimenting in the studio.

The exact catalogue depends on the active TTS backend (for example high-quality neural runtimes with ONNX or cloud providers). The dashboard shows only voices that are available for your account, with readable names so you can compare timbre and style.

Speed, clarity and workflow

You can adjust speech rate within a safe range so announcements sound natural, tutorials stay understandable, and short prompts remain punchy. This is useful for accessibility, localization (some languages sound better slightly slower), and matching the pacing of a video or IVR menu.

MP3 output and download

Generated audio is produced as compact MP3 by default in many flows, which is ideal for sharing, embedding in presentations, or sending by email without huge file sizes. From your generation history you can download the files again later, so you do not need to regenerate the same clip if you still have credits logged against that item.

For developers, the REST API returns Base64-encoded audio in JSON responses, so you can decode to MP3 bytes on the server or client and save or stream as you prefer.

API and automation

The same TTS engine is exposed through authenticated API calls: send text, voice id, language and speed; receive duration and audio payload. That lets you batch newsletters, voice menus, or app notifications without using the web UI.

At a glance

Broad language coverage and multiple voices per language where supported
Adjustable speed for clarity and style
MP3 downloads from history plus API-friendly encoding
Credits scale with audio length — transparent rules in the pricing section

Trusted for production voice workloads

Inside the product — app screenshots, workflow, and reserved logo slots on the main site.

Product teams, agencies, and developers use WebVoice for TTS, STT, chat, and API-first integrations — from prototypes to customer-facing apps.

SaaS & product

Agencies

Education

Internal tools

50K+

Hours of audio synthesized & transcribed monthly (illustrative range)

30+

Neural voices and locales in the catalogue (varies by deployment)

REST

Same credit wallet for browser app and documented HTTP API

Figures are indicative and depend on traffic and configuration.

“We shipped read-aloud and STT in one sprint — the API matched what we tested in the UI.”

Lead developer B2B SaaS

“Credits per feature make finance happy — we can forecast TTS vs chat separately.”

Product operations E-commerce

“Low-latency Groq routes for chat let us keep UX snappy without a separate vendor.”

CTO Digital agency

Quotes represent typical feedback patterns; not attributed to specific customers.

Frequently asked questions

TTS engines (Kokoro, Kokoro Fast, Google, Qwen, Inworld, Minimax, MiniMax API, …) are listed by GET /audio/api/tts-catalog/ when logged in (mobile app and web session). Voices for a chosen engine and language: GET /audio/api/voices/?provider=…&language=… or GET /api/v1/voices/ with your API key.

Session (app/browser): GET /audio/api/tts-catalog/ returns tts_providers, stt_providers, tts_languages and per-engine settings (speed, trim for Kokoro, MiniMax model). REST API key clients: pass provider on GET /api/v1/voices/ and POST /api/v1/tts/; see API documentation for the full provider id list.

Many flows output compact MP3 suitable for download and sharing; API responses can include Base64-encoded audio for programmatic use.

Billing is typically by audio duration in blocks (see the pricing page). Longer synthesis consumes more credits in a predictable way.

Yes, within the supported range in the product settings so announcements and tutorials sound natural.

Yes — authenticated REST calls use the same engines as the browser, so what you hear in the UI matches what you automate.

Text-to-speech: many languages, many voices, MP3 you can download

Languages and voices

Speed, clarity and workflow

MP3 output and download

API and automation

At a glance

Trusted for production voice workloads

Frequently asked questions

Ready to try WebVoice?

Text-to-speech: many languages, many voices, MP3 you can download

Languages and voices

Speed, clarity and workflow

MP3 output and download

API and automation

At a glance

Trusted for production voice workloads

Frequently asked questions

Which voices and languages are available?

How do I list TTS providers via API?

What audio format do I get?

How are TTS credits calculated?

Can I adjust speech speed?

Is the same TTS engine exposed via API?

Ready to try WebVoice?