MiniMax image models, optional high-end pipelines, and a documented REST API.
Models in production
The primary text-to-image stack uses the MiniMax Image API. The default model family is image-01 (and compatible revisions offered by MiniMax), which is optimised for general prompts, marketing visuals, and concept art. You describe the scene in natural language; the service returns raster images (typically PNG or JPEG) with configurable aspect ratios such as 1:1, 16:9, or 9:16 to match slides, social posts, or vertical stories.
Administrators can extend the deployment with additional backends (for example GPU-hosted diffusion on external workers) where the infrastructure allows. Those paths are optional and may require separate capacity planning; the user-facing API and credits model focus on the MiniMax route so integrations stay predictable.
API and credits
Authenticated clients can call POST /api/v1/image/ with a JSON body: prompt (required), optional width and height (mapped to the closest supported aspect ratio), seed, and model identifier when the catalogue exposes more than one image model. Responses include Base64-encoded image bytes plus credit usage so you can meter spend the same way as TTS or chat.
Each successful generation consumes credits according to the server setting IMAGE_GENERATION_CREDITS (default 7 credits per image). Insufficient balance returns HTTP 402; if MiniMax is not configured on the server, the endpoint responds with HTTP 503 and a clear message.
Prompting tips
Be specific about style (watercolour, photorealistic, flat vector), lighting, and subject. Keep prompts under the provider limit (thousands of characters) to avoid truncation. For brand-safe content, follow the same acceptable-use rules as the rest of WebVoice and your organisation’s policies.
Summary
MiniMax image models (e.g. image-01) for standard generation
Aspect ratio via API or derived from width × height
Inside the product — app screenshots, workflow, and reserved logo slots on the main site.
Product teams, agencies, and developers use WebVoice for TTS, STT, chat, and API-first integrations — from prototypes to customer-facing apps.
SaaS & product
Agencies
Education
Internal tools
50K+
Hours of audio synthesized & transcribed monthly (illustrative range)
30+
Neural voices and locales in the catalogue (varies by deployment)
REST
Same credit wallet for browser app and documented HTTP API
Figures are indicative and depend on traffic and configuration.
“We shipped read-aloud and STT in one sprint — the API matched what we tested in the UI.”
Lead developer B2B SaaS
“Credits per feature make finance happy — we can forecast TTS vs chat separately.”
Product operations E-commerce
“Low-latency Groq routes for chat let us keep UX snappy without a separate vendor.”
CTO Digital agency
Quotes represent typical feedback patterns; not attributed to specific customers.
Frequently asked questions
Credits are a single balance for the web app and API. They pay for text-to-speech, speech-to-text, translation, chat turns, image generation (where enabled), and other metered features. Daily free credits renew at login; purchased credits do not expire.
You can register and use daily free credits without a subscription. Buying credits or a plan is optional when you need higher volume.
Yes. API keys are tied to your account and draw from the same credit wallet as the browser app, so you can prototype in the UI and ship server-side calls with one pool of credits.
Safeguard-class models are tuned for stronger refusals and policy alignment. They are a good default for customer-facing or regulated workflows. Other models may trade cost or latency for different strengths — see the model list for credits per request.
Processing depends on the feature: some workloads run on our infrastructure and third-party providers you configure (e.g. Groq, MiniMax). Read the privacy and AI policy pages for retention and provider details.