AI image generation & models

Models in production

The primary text-to-image stack uses the MiniMax Image API. The default model family is image-01 (and compatible revisions offered by MiniMax), which is optimised for general prompts, marketing visuals, and concept art. You describe the scene in natural language; the service returns raster images (typically PNG or JPEG) with configurable aspect ratios such as 1:1, 16:9, or 9:16 to match slides, social posts, or vertical stories.

Administrators can extend the deployment with additional backends (for example GPU-hosted diffusion on external workers) where the infrastructure allows. Those paths are optional and may require separate capacity planning; the user-facing API and credits model focus on the MiniMax route so integrations stay predictable.

API and credits

Authenticated clients can call POST /api/v1/image/ with a JSON body: prompt (required), optional width and height (mapped to the closest supported aspect ratio), seed, and model identifier when the catalogue exposes more than one image model. Responses include Base64-encoded image bytes plus credit usage so you can meter spend the same way as TTS or chat.

Each successful generation consumes credits according to the server setting IMAGE_GENERATION_CREDITS (default 7 credits per image). Insufficient balance returns HTTP 402; if MiniMax is not configured on the server, the endpoint responds with HTTP 503 and a clear message.

Prompting tips

Be specific about style (watercolour, photorealistic, flat vector), lighting, and subject. Keep prompts under the provider limit (thousands of characters) to avoid truncation. For brand-safe content, follow the same acceptable-use rules as the rest of WebVoice and your organisation’s policies.

Summary

MiniMax image models (e.g. image-01) for standard generation
Aspect ratio via API or derived from width × height
REST endpoint documented alongside TTS and STT
Credits debited per image on success

Trusted for production voice workloads

Inside the product — app screenshots, workflow, and reserved logo slots on the main site.

Product teams, agencies, and developers use WebVoice for TTS, STT, chat, and API-first integrations — from prototypes to customer-facing apps.

SaaS & product

Agencies

Education

Internal tools

50K+

Hours of audio synthesized & transcribed monthly (illustrative range)

30+

Neural voices and locales in the catalogue (varies by deployment)

REST

Same credit wallet for browser app and documented HTTP API

Figures are indicative and depend on traffic and configuration.

“We shipped read-aloud and STT in one sprint — the API matched what we tested in the UI.”

Lead developer B2B SaaS

“Credits per feature make finance happy — we can forecast TTS vs chat separately.”

Product operations E-commerce

“Low-latency Groq routes for chat let us keep UX snappy without a separate vendor.”

CTO Digital agency

Quotes represent typical feedback patterns; not attributed to specific customers.

Frequently asked questions

Credits are a single balance for the web app and API. They pay for text-to-speech, speech-to-text, translation, chat turns, image generation (where enabled), and other metered features. Daily free credits renew at login; purchased credits do not expire.

You can register and use daily free credits without a subscription. Buying credits or a plan is optional when you need higher volume.

Yes. API keys are tied to your account and draw from the same credit wallet as the browser app, so you can prototype in the UI and ship server-side calls with one pool of credits.

Safeguard-class models are tuned for stronger refusals and policy alignment. They are a good default for customer-facing or regulated workflows. Other models may trade cost or latency for different strengths — see the model list for credits per request.

Processing depends on the feature: some workloads run on our infrastructure and third-party providers you configure (e.g. Groq, MiniMax). Read the privacy and AI policy pages for retention and provider details.

AI image generation from text

Models in production

API and credits

Prompting tips

Summary

Trusted for production voice workloads

Frequently asked questions

Ready to try WebVoice?

AI image generation from text

Models in production

API and credits

Prompting tips

Summary

Trusted for production voice workloads

Frequently asked questions

What are WebVoice credits and how are they used?

Do I need a credit card to try the product?

Can I use the same account for the website and the API?

How do safeguard models differ from other chat models?

Where are voices and data processed?

Ready to try WebVoice?