WebVoice API Documentation
Overview — API forward
WebVoice is an API forward (bridge): your application talks to WebVoice with your account API key; we authenticate you, pick the managed model, call the upstream provider (Groq, DeepSeek, OpenRouter, Gemini, Moonshot, Z.AI, MiniMax, DeepInfra), and debit credits on your profile — same wallet as the web app.
For AI chat you can use the OpenAI-compatible surface: set base_url to the value below and pass your WebVoice key as api_key. Streaming (SSE) and non-streaming completions are supported.
https://webvoice.easytaskflow.app/api/v1/
Integration guide (App & tools) · Sign in to create API keys
Free chat models (0 credits)
These models are routed via OpenRouter with verified :free slugs. Via API they cost 0 credits per request — you can call them even with zero balance. Model ids: openrouter:openrouter/free, openrouter:openai/gpt-oss-20b:free, openrouter:openai/gpt-oss-120b:free, openrouter:google/gemma-4-31b-it:free. Groq-hosted similar models are billed normally.
Rate limits (OpenRouter-aligned, per account): 20 requests/minute; 50/day without credit purchases, 1000/day after at least one purchase. HTTP 429 with Retry-After when exceeded.
| Model ID | Name | Cost |
|---|---|---|
openrouter:openrouter/free |
OpenRouter Free (auto) | Free |
openrouter:openai/gpt-oss-120b:free |
GPT OSS 120B (OpenRouter free) | Free |
openrouter:openai/gpt-oss-20b:free |
GPT OSS 20B (OpenRouter free) | Free |
openrouter:google/gemma-4-31b-it:free |
Google Gemma 4 31B (OpenRouter free) | Free |
All public chat models
Models available for integration (subject to your account tier after sign-in). Use the model id in POST chat/completions/.
| Model ID | Name | Provider | Credits / request |
|---|---|---|---|
| deepseek-v4-flash | DeepSeek V4 Flash | DeepSeek | 2.0 |
| openrouter:openrouter/free | OpenRouter Free (auto) | OpenRouter | Free |
| openrouter:allenai/olmo-3.1-32b-think | AllenAI: Olmo 3.1 32B Think | OpenRouter | 2.0 |
| deepseek-v4-pro | DeepSeek V4 Pro | DeepSeek | 3.0 |
| openrouter:openai/gpt-oss-120b:free | GPT OSS 120B (OpenRouter free) | OpenRouter | Free |
| deepseek-reasoner | DeepSeek Reasoner (V4 thinking) | DeepSeek | 3.0 |
| gemini | Google Gemini | Google Gemini | 2.0 |
| openrouter:google/gemini-2.5-flash | Google: Gemini 2.5 Flash | OpenRouter | 2.0 |
| openrouter:openai/gpt-oss-20b:free | GPT OSS 20B (OpenRouter free) | OpenRouter | Free |
| deepseek-websearch | DeepSeek WebSearch (V4 Flash) | DeepSeek | 2.0 |
| openrouter:google/gemini-2.5-flash-image | Google: Gemini 2.5 Flash Image (Nano Banana) | OpenRouter | 2.0 |
| qwen3_fast | Qwen3 Fast (Groq) | Groq | 2.0 |
| llama-3.1-8b-instant | Llama 3.1 8B Instant (Groq) | Groq | 2.0 |
| openrouter:google/gemma-4-31b-it:free | Google Gemma 4 31B (OpenRouter free) | OpenRouter | Free |
| openai/gpt-oss-safeguard-20b | GPT OSS Safeguard 20B (Groq) | Groq | 2.0 |
| moonshotai/kimi-k2-instruct-0905 | Kimi K2 (Groq) | Groq | 2.0 |
| openai/gpt-oss-20b | GPT OSS 20B (Groq) | Groq | 2.0 |
| llama-3.3-70b-versatile | Llama 3.3 70B Versatile (Groq) | Groq | 3.0 |
| openai/gpt-oss-120b | GPT OSS 120B (Groq) | Groq | 2.0 |
| moonshotai/Kimi-K2.6 | Kimi K2.6 (DeepInfra) | DeepInfra | 3.0 |
| nvidia/Nemotron-3-Nano-Omni-30B-A3B-Reasoning | Nemotron 3 Nano Omni 30B Reasoning (DeepInfra) | DeepInfra | 2.0 |
| Qwen/Qwen3-Max-Thinking | Qwen3 Max Thinking (DeepInfra) | DeepInfra | 5.0 |
| google/gemma-4-26B-A4B-it | Gemma 4 26B IT (DeepInfra) | DeepInfra | 1.0 |
| deepseek-chat | DeepSeek Chat (legacy alias) | DeepSeek | 2.0 |
| kimi-k2.5 | Moonshot Kimi K2.5 | Moonshot Kimi | 2.0 |
| glm-4.6 | Z.AI GLM 4.6 | Z.AI GLM | 2.0 |
| glm-4.7 | Z.AI GLM 4.7 | Z.AI GLM | 2.0 |
| glm-5 | Z.AI GLM 5 | Z.AI GLM | 2.0 |
| MiniMax-M2.7-highspeed | MiniMax M2.7 Highspeed (api.minimax.io) | MiniMax (api.minimax.io) | 2.0 |
Authentication
TTS, STT, Translation, AI Chat, Image generation and Voices endpoints require an API key. You can create API keys from the API Dashboard.
Login with email code (no API key)
For web/mobile login without password: send email to receive a code, then verify the code. Rate limited per IP (no CAPTCHA).
- POST https://webvoice.easytaskflow.app/api/v1/auth/send-code/ — JSON:
email,accept_privacy,accept_terms. Response includesis_new_user. - POST https://webvoice.easytaskflow.app/api/v1/auth/verify-code/ — JSON:
email,code, optionalcreate_api_key,api_key_name. Stateless (no session cookie required). Returnsapi_key,onboarding(Solana memo + wallet). - GET https://webvoice.easytaskflow.app/api/v1/onboarding/ — Requires API key. Credits, can_use_api, optional Solana memo, PayPal URLs.
- POST https://webvoice.easytaskflow.app/api/v1/keys/ — Requires API key. JSON
{"name": "my-agent"}→ newwv_…key.
Limits: 5 send-code and 20 verify-code requests per IP per 15 minutes.
Autonomous agent flow (register → use → optional top-up)
New accounts receive welcome credits (~20). If onboarding.can_use_api is true, start calling chat/TTS/STT immediately — Solana or PayPal are only needed when credits run out.
POST auth/send-code/— agent email + accept terms- Read OTP from email (human or mailbox API)
POST auth/verify-code/—create_api_key: true→ wv_… api_key + onboarding (credits, can_use_api, optional solana.memo_code)- Configure MCP with WEBVOICE_API_KEY
- If credits > 0: call webvoice_chat / webvoice_tts / … immediately
- Optional top-up: Solana (onboarding.solana) or PayPal (onboarding.urls.buy_credits_paypal)
# 1 — send code
curl -X POST "https://webvoice.easytaskflow.app/api/v1/auth/send-code/" \\
-H "Content-Type: application/json" \\
-d '{"email":"agent@example.com","accept_privacy":true,"accept_terms":true}'
# 2 — verify + get API key (stateless; email in JSON, no session cookie)
curl -X POST "https://webvoice.easytaskflow.app/api/v1/auth/verify-code/" \\
-H "Content-Type: application/json" \\
-d '{"email":"agent@example.com","code":"123456","create_api_key":true,"api_key_name":"cursor-agent"}'
# Example onboarding fields in verify response:
# {
# "api_key": "wv_…",
# "onboarding": {
# "credits": 20.0,
# "can_use_api": true,
# "billing": { "topup_required": false, "note": "…" },
# "solana": { "available": true, "wallet": "…", "memo_code": "WV…" }
# }
# }
Same flow via MCP: webvoice_register_send_code → webvoice_register_verify → webvoice_onboarding. See MCP section.
Using API Key
Include your API key in one of the following ways:
Header (Recommended)
X-API-Key: wv_your_api_key_here
Or use Authorization: Bearer wv_your_api_key_here. API keys in query strings (?api_key=) are not accepted — they would appear in server logs and browser history.
API Endpoints
Base URL: https://webvoice.easytaskflow.app/api/v1/
Production API endpoint: webvoice.easytaskflow.app
| Endpoint | Method | Description |
|---|---|---|
/auth/send-code/ |
POST | Request login code by email (rate limited, no API key) |
/auth/verify-code/ |
POST | Verify code; optional create_api_key + onboarding (stateless JSON) |
/onboarding/ |
GET | Agent onboarding: credits, can_use_api, optional Solana memo (API key required) |
/keys/ |
POST | Create additional API key (JSON name) |
/tts/ |
POST | Generate speech from text |
/stt/ |
POST | Transcribe audio to text |
/translation/ |
POST | Translate text between languages |
/chat/models/ |
GET | List AI chat models (WebVoice extended fields) |
/models/ |
GET | OpenAI-compatible model list |
/chat/completions/ |
POST | OpenAI-compatible chat completions (optional SSE stream) |
/image/ |
POST | Text-to-image (MiniMax); server must have MINIMAX_API_KEY |
/voices/ |
GET | Get available voices |
/status/ |
GET | Check account status and credits |
/send-email/ |
POST | Send an email to a given address (text only, no attachments). Requires API key. |
Send Email
Endpoint: POST /api/v1/send-email/
Sends an email from the site to the specified address. Text only, no attachments. Requires API key.
Rate limit: 10 emails per account per 1 hour(s). HTTP 429 with Retry-After when exceeded.
Request Body
{
"to": "destinatario@example.com",
"subject": "Oggetto (optional)",
"body": "Testo dell'email"
}
Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
to |
string | Yes | Recipient email address |
subject |
string | No | Email subject (default: Message from WebVoice API) |
body / text / content |
string | Yes | Email body (plain text, max 50000 characters) |
Response
{
"success": true,
"message": "Email sent successfully",
"to": "destinatario@example.com",
"subject": "Oggetto"
}
Text-to-Speech (TTS)
Endpoint: POST /api/v1/tts/
Request Body
{
"text": "Hello, world!",
"voice": "af_sarah",
"language": "en",
"speed": 0.90
}
Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
text |
string | Yes | Text to convert to speech |
voice |
string | Yes | Voice ID (e.g., 'af_sarah', 'if_sara') |
language |
string | No | Language code (default: 'en') |
speed |
float | No | Speech speed (0.5-2.0, default: 0.90) |
Response
{
"success": true,
"audio": "base64_encoded_mp3_data",
"format": "mp3",
"duration": 2.5,
"credits_used": 0.5,
"credits_remaining": 99.5
}
Speech-to-Text (STT)
Endpoint: POST /api/v1/stt/
Request
Send as multipart/form-data:
audio: Audio file (MP3, WAV, M4A, FLAC)language: Optional. Omit or use empty/auto for automatic language detection (Whisper). Or specify code (it, en, es, fr, de, pt, ru, zh, ja, ko, etc.)provider: Optional. whisper_small (default from account), whisper_fast (DeepInfra Turbo), whisper_groq (alias), whisper_max_local (Modal). Requires DEEPINFRA_API_KEY for whisper_fast.
Response
The 'language' field returns the auto-detected or specified language code.
{
"success": true,
"text": "Transcribed text here",
"language": "it",
"duration": 10.5,
"credits_used": 0.3,
"credits_remaining": 99.2
}
Translation
Endpoint: POST /api/v1/translation/
Request Body
{
"text": "Hello, world!",
"source_language": "en",
"target_language": "it"
}
Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
text |
string | Yes | Text to translate |
source_language |
string | No | Source language code (default: 'auto', which defaults to 'en') |
target_language |
string | Yes | Target language code (e.g., 'it', 'en', 'es', 'fr', 'de', 'pt', 'ru', 'zh', 'ja', 'ko', 'ar', 'nl', 'pl', 'tr', 'cs') |
Supported Languages
The following language codes are supported:
en- Englishit- Italianes- Spanishfr- Frenchde- Germanpt- Portugueseru- Russianzh- Chineseja- Japaneseko- Koreanar- Arabicnl- Dutchpl- Polishtr- Turkishcs- Czech
Response
{
"success": true,
"translated_text": "Ciao, mondo!",
"source_language": "en",
"target_language": "it",
"text_length": 13,
"credits_used": 0.1,
"credits_remaining": 99.9
}
AI Chat (OpenAI-compatible bridge)
Use WebVoice as a drop-in OpenAI API base URL: authenticate with your WebVoice API key (X-API-Key or Authorization: Bearer), choose a managed model, and credits are billed on your profile. Responses follow the OpenAI chat.completions schema; credit info is in the optional webvoice field.
OpenAI base URL
https://webvoice.easytaskflow.app/api/v1/
Example: POST chat/completions/, GET models/, compatible with OpenAI SDK when you set base_url and api_key.
List models
WebVoice extended: GET /api/v1/chat/models/ — includes credits_per_request and display_name.
OpenAI-compatible: GET /api/v1/models/
{
"object": "list",
"data": [
{
"id": "deepseek-v4-flash",
"object": "model",
"created": 1700000000,
"owned_by": "deepseek"
}
]
}
Chat completion
Endpoint: POST /api/v1/chat/completions/
Same request body as OpenAI: model, messages, max_tokens, temperature, stream. Set stream: true for Server-Sent Events (text/event-stream).
DeepSeek routes: deepseek-v4-flash (default, fast), deepseek-v4-pro (frontier), deepseek-reasoner (V4 thinking mode). Legacy deepseek-chat still works and maps to V4 Flash.
Request body (JSON)
{
"model": "deepseek-v4-flash",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Summarize WebVoice in one sentence."}
],
"max_tokens": 2000,
"temperature": 0.7,
"stream": false,
"web_search": false
}
Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
model |
string | Yes | Model id from GET /models/ or /chat/models/ |
provider |
string | No | Provider key (e.g. groq, openrouter, deepseek). Required when the same model id exists on multiple providers. |
messages |
array | Yes | OpenAI message list. Standard chat: system|user|assistant (last role user). With tools: also assistant (with tool_calls) and tool (last role user or tool). |
tools |
array | No | OpenAI function tools array. Supported on DeepSeek and OpenRouter models only. |
tool_choice |
string|object | No | none, auto (default when tools set), required, or {"type":"function","function":{"name":"…"}} |
stream |
boolean | No | If true, SSE stream of chat.completion.chunk events ending with data: [DONE] |
max_tokens |
integer | No | Max response tokens (1–8000, default 2000) |
temperature |
number | No | Sampling temperature 0–2 (default 0.7) |
web_search |
boolean | No | WebVoice extension: web search where supported (DeepSeek WebSearch, Moonshot, Z.AI) |
Response (non-streaming)
{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"created": 1700000000,
"model": "deepseek-v4-flash",
"choices": [{
"index": 0,
"message": {
"role": "assistant",
"content": "WebVoice is a voice and AI platform for TTS, STT, chat and memos."
},
"finish_reason": "stop"
}],
"usage": {
"prompt_tokens": 42,
"completion_tokens": 18,
"total_tokens": 60
},
"webvoice": {
"credits_used": 2.0,
"credits_remaining": 98.0,
"provider": "deepseek"
}
}
Response (streaming SSE)
data: {"id":"chatcmpl-...","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"role":"assistant","content":""},"finish_reason":null}]}
data: {"id":"chatcmpl-...","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":"Hello"},"finish_reason":null}]}
data: {"id":"chatcmpl-...","object":"chat.completion.chunk","choices":[{"index":0,"delta":{},"finish_reason":"stop"}],"usage":{"total_tokens":60},"webvoice":{"credits_used":2.0,"credits_remaining":98.0}}
data: [DONE]
Example (curl)
# Non-streaming
curl -X POST "https://webvoice.easytaskflow.app/api/v1/chat/completions/" \
-H "Authorization: Bearer wv_your_api_key_here" \
-H "Content-Type: application/json" \
-d '{"model":"deepseek-v4-flash","messages":[{"role":"user","content":"Hello"}]}'
# Streaming
curl -N -X POST "https://webvoice.easytaskflow.app/api/v1/chat/completions/" \
-H "X-API-Key: wv_your_api_key_here" \
-H "Content-Type: application/json" \
-d '{"model":"deepseek-v4-flash","stream":true,"messages":[{"role":"user","content":"Hello"}]}'
Example (OpenAI Python SDK)
from openai import OpenAI
client = OpenAI(
api_key="wv_your_api_key_here",
base_url="https://webvoice.easytaskflow.app/api/v1",
)
# Non-streaming
r = client.chat.completions.create(
model="deepseek-v4-flash",
messages=[{"role": "user", "content": "Hello"}],
)
print(r.choices[0].message.content)
# Streaming
stream = client.chat.completions.create(
model="deepseek-v4-flash",
messages=[{"role": "user", "content": "Hello"}],
stream=True,
)
for event in stream:
if event.choices[0].delta.content:
print(event.choices[0].delta.content, end="")
Agent integration — tools / function calling
Build agents that call your own functions: send tools and tool_choice in the chat completion request. When the model wants to run a function, the response contains tool_calls instead of (or in addition to) text. Your code executes the function locally, appends role tool messages, and calls the API again until finish_reason is stop.
Supported providers: DeepSeek (e.g. deepseek-v4-flash) and OpenRouter models. Not available on Groq/Gemini free-tier routes. web_search and tools cannot be combined in one request.
Step 1 — define tools
{
"model": "deepseek-v4-flash",
"tool_choice": "auto",
"tools": [{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get current weather for a city",
"parameters": {
"type": "object",
"properties": {
"city": {"type": "string", "description": "City name"}
},
"required": ["city"]
}
}
}],
"messages": [
{"role": "user", "content": "What is the weather in Rome?"}
]
}
Step 2 — model returns tool_calls
{
"choices": [{
"message": {
"role": "assistant",
"content": null,
"tool_calls": [{
"id": "call_abc123",
"type": "function",
"function": {
"name": "get_weather",
"arguments": "{\"city\":\"Rome\"}"
}
}]
},
"finish_reason": "tool_calls"
}],
"webvoice": {"credits_used": 2.0, "credits_remaining": 98.0}
}
Step 3 — execute locally and call again
Run get_weather("Rome") in your app, then POST again with the full message history including the assistant tool_calls message and your tool result:
{
"model": "deepseek-v4-flash",
"tools": [ ... same tools ... ],
"messages": [
{"role": "user", "content": "What is the weather in Rome?"},
{
"role": "assistant",
"content": null,
"tool_calls": [{
"id": "call_abc123",
"type": "function",
"function": {"name": "get_weather", "arguments": "{\"city\":\"Rome\"}"}
}]
},
{
"role": "tool",
"tool_call_id": "call_abc123",
"content": "{\"temp_c\": 22, \"condition\": \"sunny\"}"
}
]
}
Agent loop (pseudocode)
messages = [{"role": "user", "content": user_prompt}]
while True:
resp = client.chat.completions.create(
model="deepseek-v4-flash",
messages=messages,
tools=TOOLS,
tool_choice="auto",
)
msg = resp.choices[0].message
if resp.choices[0].finish_reason == "tool_calls":
messages.append(msg.model_dump())
for tc in msg.tool_calls:
result = run_function(tc.function.name, json.loads(tc.function.arguments))
messages.append({
"role": "tool",
"tool_call_id": tc.id,
"content": json.dumps(result),
})
continue
return msg.content
Streaming with tools
Set stream: true. Chunks may include delta.tool_calls (partial). The final chunk has finish_reason tool_calls or stop. Credits are billed once per completed API request (same as non-streaming).
MCP server — use WebVoice in Cursor
Install the local stdio MCP server (webvoice-mcp) to expose WebVoice as tools in Cursor, Claude Desktop, or any MCP client. Calls are forwarded to this REST API.
Option A — Agent self-registration (no browser)
- Add MCP to Cursor (config below); WEBVOICE_API_KEY can be empty for the first step.
webvoice_register_send_code— email + accept termswebvoice_register_verify— OTP → api_key (once) + onboarding- Set WEBVOICE_API_KEY in mcp.json and restart Cursor.
- If onboarding.can_use_api is true, use chat/TTS/STT right away with welcome credits.
- Optional: top up via Solana (webvoice_onboarding → solana.memo_code) or PayPal URLs in onboarding.urls.
REST equivalent: Agent registration (auth/send-code, auth/verify-code).
Option B — Browser signup (human)
- Sign up / log in — Login with email code or Google.
- Create an API key — API dashboard →
wv_… - Configure MCP — WEBVOICE_API_KEY in Cursor (below).
Credits are shared across web, API, and MCP. When balance is zero, calls return HTTP 402 and we email a recharge link. Solana and PayPal top-ups are optional — not required if you already have credits.
Install
pip install webvoice-mcp
# or from source:
pip install -r requirements-mcp.txt
pip install -e .
PyPI package + MCP Registry: io.github.easytaskflow/webvoice-mcp — see docs/MCP_DISTRIBUTION.md in the repo.
Cursor configuration
Add to ~/.cursor/mcp.json (Settings → MCP):
{
"mcpServers": {
"webvoice": {
"command": "webvoice-mcp",
"env": {
"WEBVOICE_API_KEY": "wv_your_api_key_here"
}
}
}
}
Optional env: WEBVOICE_BASE_URL (default https://webvoice.easytaskflow.app/api/v1).
MCP tools
| Tool | Maps to / purpose |
|---|---|
webvoice_register_send_code | POST auth/send-code/ — no API key |
webvoice_register_verify | POST auth/verify-code/ — api_key + onboarding |
webvoice_onboarding | GET onboarding/ — credits, can_use_api, Solana memo |
webvoice_status | GET status/ |
webvoice_list_chat_models | GET chat/models/ |
webvoice_list_voices | GET voices/ |
webvoice_chat | POST chat/completions/ — DeepSeek, Groq, OpenRouter, DeepInfra |
webvoice_deepinfra_chat | POST chat/completions/ — Kimi K2.6, Nemotron, Qwen3 Max Thinking, Gemma 4 IT |
webvoice_tts | POST tts/ — set output_path to save MP3 |
webvoice_stt | POST stt/ — local audio_path; optional provider |
webvoice_whisper_fast_stt | POST stt/ — DeepInfra whisper-large-v3-turbo |
webvoice_translate | POST translation/ |
webvoice_image | POST image/ — MiniMax, WAN, Gen4, SDXL, qwen-image-max, … |
webvoice_qwen_image_max | POST image/ — Qwen-Image-Max via DeepInfra |
webvoice_image_to_image | POST image-to-image/ — FLUX.2 Flash Edit |
webvoice_image_to_video | POST image-to-video/ — Seedance / Gen4 Turbo |
webvoice_text_to_video | POST text-to-video/ — Seedance, Veo Lite, Veo Fast |
webvoice_veo_fast_text_to_video | POST text-to-video/ — Google Veo 3.1 Fast (DeepInfra) |
webvoice_account_links | Dashboard / billing URLs |
See also webvoice_mcp/README.md in the repository.
Image generation (text-to-image)
Default provider is MiniMax (MINIMAX_API_KEY). WaveSpeed providers (WAVESPEED_API_KEY): wan, nucleus, gen4, gen4-turbo, sd3, sdxl, flux-klein-lora. DeepInfra (DEEPINFRA_API_KEY): qwen-image-max. Missing key → HTTP 503.
Endpoint: POST /api/v1/image/
Upstream docs: MiniMax · WAN 2.7 · Nucleus · Gen4 · SDXL · FLUX Klein LoRA · Qwen-Image-Max (DeepInfra)
Request body (JSON)
{
"prompt": "A red bicycle on a sunny street",
"provider": "minimax",
"width": 1024,
"height": 1024,
"seed": 42,
"model": "image-01",
"aspect_ratio": "16:9",
"response_format": "base64",
"provider": "nucleus",
"negative_prompt": "blurry, low quality",
"num_images": 1,
"num_inference_steps": 50,
"guidance_scale": 8,
"output_format": "png",
"provider": "gen4",
"resolution": "1080p",
"reference_images": ["https://example.com/ref1.jpg"],
"seed": 42
}
Runway Gen4 Image Turbo example (faster, ~$0.03 upstream):
{
"provider": "gen4-turbo",
"prompt": "Character wearing a leather jacket in a cyberpunk city",
"aspect_ratio": "9:16",
"resolution": "1080p",
"reference_images": ["https://example.com/character-ref.jpg"]
}
Stable Diffusion 3 example (~$0.03 upstream, optional img2img):
{
"provider": "sd3",
"prompt": "Marina Bay at sunset, vivid purple and orange afterglow, long exposure",
"aspect_ratio": "16:9",
"image_url": "https://example.com/style-ref.jpg",
"seed": -1
}
Stability AI SDXL example (~$0.0026 upstream):
{
"provider": "sdxl",
"prompt": "Close-up food photography of a gourmet cheeseburger, shallow depth of field",
"aspect_ratio": "1:1",
"size": "1024*1024",
"seed": -1
}
Qwen-Image-Max via DeepInfra example (~$0.075/image upstream, photorealistic):
{
"provider": "qwen-image-max",
"prompt": "Portrait of a woman in natural light, lifelike skin texture, shallow depth of field",
"aspect_ratio": "3:4",
"size": "1024x1536"
}
FLUX.2 Klein 9B LoRA example (~$0.02 upstream):
{
"provider": "flux-klein-lora",
"prompt": "Cinematic romantic drama still, rooftop at sunset, film photography",
"aspect_ratio": "16:9",
"size": "1024*1024",
"loras": [
{"path": "https://example.com/my-style-lora.safetensors", "scale": 1.0}
],
"seed": -1
}
Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
provider |
string | No | minimax (default), wan, nucleus, gen4, gen4-turbo, sd3, sdxl, or flux-klein-lora |
prompt |
string | Yes | Text description of the image (up to 4000 characters, truncated server-side) |
width / height |
integer | No | Optional pixel size; each clamped between 512 and 2048. For MiniMax, used to pick the closest aspect_ratio. For WAN, sent as width×height (e.g. 1024*1024). |
seed |
integer | No | Optional; reproducible generation when supported by the provider. |
model |
string | No | MiniMax only — image model (e.g. image-01). Unknown values fall back to the server default. |
aspect_ratio |
string | No | e.g. 1:1, 16:9, 9:16 — supported by all providers when width/height are omitted. |
thinking_mode |
boolean | No | WAN only — enhanced reasoning for better quality (default: true). |
negative_prompt |
string | No | Nucleus only — elements to exclude from the image. |
num_images |
integer | No | Nucleus only — 1 (default) or 2 images per request. Billed per image. |
num_inference_steps |
integer | No | Nucleus only — 1–100, default 50. |
guidance_scale |
number | No | Nucleus only — classifier-free guidance, 0–20, default 8. |
output_format |
string | No | Nucleus only — png (default) or jpeg. |
resolution |
string | No | Gen4 / Gen4 Turbo — 720p or 1080p (default 1080p). |
reference_images / reference_image_urls |
array / string | No | Gen4 / Gen4 Turbo — up to 3 public HTTPS URLs to guide style or subject. |
size |
string | No | SDXL / WAN — canvas size e.g. 1024*1024 (256–1536 px per side for SDXL). |
loras / lora_url |
array / string | No | FLUX Klein LoRA — list of {path, scale?} objects, or single lora_url + lora_scale. |
image_url |
string | No | SD3 / SDXL — optional reference image URL for image-to-image. |
response_format |
string | No | base64 (default) or url — how MiniMax returns image data; the API always responds with base64-encoded image bytes in JSON. |
Response
Binary images are returned as Base64 strings (typically PNG or JPEG depending on the provider).
{
"success": true,
"provider": "wan",
"image": "base64_encoded_image_bytes",
"images": ["base64_1", "base64_2"],
"count": 1,
"credits_used": 1.0,
"credits_remaining": 99.0
}
Credits per image: MiniMax 7, WAN 8, Nucleus 4 (× num_images), Gen4 2/3, Gen4 Turbo 1, SD3 1, SDXL 1, FLUX Klein LoRA 1.
Image to video
Generates video from a reference image. Default provider is Seedance 2.0 Turbo (720p/1080p, optional audio); use provider gen4 for Runway Gen4 Turbo (WAVESPEED_API_KEY).
Endpoint: POST /api/v1/image-to-video/
Upstream docs: Seedance 2.0 I2V Turbo · Runway Gen4 Turbo
Request body (JSON or multipart)
{
"provider": "seedance",
"prompt": "Slow camera pan, golden hour light, gentle wind in the trees",
"image_url": "https://example.com/reference.jpg",
"duration": 5,
"resolution": "720p",
"aspect_ratio": "adaptive",
"generate_audio": true
}
Runway Gen4 Turbo example:
{
"provider": "gen4",
"prompt": "The subject turns slowly toward the camera, soft wind in their hair",
"image_url": "https://example.com/reference.jpg",
"duration": 5,
"aspect_ratio": "16:9"
}
Alternatively send multipart/form-data with fields prompt, duration, resolution, aspect_ratio, generate_audio and file field image.
Or pass reference image as base64 in JSON field image (instead of image_url).
Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
provider |
string | No | seedance (default) or gen4 / runway |
prompt |
string | Yes | Motion, camera, lighting, mood. |
image_url / image |
string | Yes | Public HTTPS URL or base64-encoded image bytes. |
duration |
integer | No | 4–15 seconds for Seedance (default 5); 2–10 for Gen4 (default 5). |
resolution |
string | No | 720p (default) or 1080p (Seedance only) |
aspect_ratio |
string | No | Seedance: 16:9, 9:16, 1:1, 4:3, 3:4, 21:9, adaptive (default). Gen4: 16:9, 9:16, 1:1, 4:3, 3:4 or omit to match source. |
generate_audio |
boolean | No | Native synchronized audio (Seedance only, default: true). |
last_image_url |
string | No | Optional last-frame image for continuation. |
Response
{
"success": true,
"provider": "seedance-2.0-turbo",
"video": "base64_encoded_mp4_bytes",
"duration": 5,
"resolution": "720p",
"credits_used": 15.0,
"credits_remaining": 85.0,
"video_size_bytes": 1234567
}
Credits: Seedance — 15/16 per 5s @ 720p/1080p (× duration). Gen4 Turbo — 2 per 5s (~$0.05 upstream).
Image to image
Edits 1–4 input images with FLUX.2 Flash Edit — restyle scenes, replace backgrounds, inpaint/outpaint, and retouch with plain-English prompts (WAVESPEED_API_KEY).
Endpoint: POST /api/v1/image-to-image/
Upstream docs: FLUX.2 Flash Edit
Request body (JSON or multipart)
{
"provider": "flux-flash-edit",
"prompt": "Replace the background with a modern office, keep the product centered",
"images": [
"https://example.com/product.jpg"
],
"aspect_ratio": "1:1",
"seed": 42
}
Multiple input images (up to 4):
{
"prompt": "Merge styles: subject from first image, lighting from second",
"image_urls": "https://example.com/a.jpg,https://example.com/b.jpg"
}
Alternatively send multipart/form-data with fields prompt, aspect_ratio, size, seed and file field images (1–4 files) or single image.
Or pass input images as base64 in JSON field image or array images.
Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
provider |
string | No | flux-flash-edit (default) |
prompt |
string | Yes | Edit instructions in plain English (HEX colors supported). |
images / image_url / image |
array / string | Yes | 1–4 public HTTPS URLs or base64-encoded images. |
aspect_ratio |
string | No | 1:1, 16:9, 9:16, 4:3, 3:4 (default 1024×1024). |
size |
string | No | Custom width*height (256–1536), e.g. 1024*768. |
seed |
integer | No | Optional reproducibility seed. |
Response
{
"success": true,
"provider": "flux-2-flash-edit",
"image": "base64_encoded_png_or_jpeg_bytes",
"input_count": 1,
"credits_used": 1.0,
"credits_remaining": 99.0,
"image_size_bytes": 456789
}
Credits: FLUX.2 Flash Edit — 1 credit per edit (~$0.013 upstream).
Text to video
Generates cinematic 720p/1080p video from a text prompt. Default provider is Seedance 2.0 Fast Turbo (WAVESPEED_API_KEY); use provider veo for Google Veo 3.1 Lite (WaveSpeed) or veo-fast for Google Veo 3.1 Fast (DEEPINFRA_API_KEY, $0.15/s upstream).
Endpoint: POST /api/v1/text-to-video/
Upstream docs: Seedance 2.0 Fast T2V Turbo · Google Veo 3.1 Lite · Google Veo 3.1 Fast (DeepInfra)
Request body (JSON)
{
"provider": "veo",
"prompt": "Tracking shot through a neon-lit alley at night, rain reflections",
"aspect_ratio": "16:9",
"resolution": "720p",
"duration": 6,
"generate_audio": true,
"negative_prompt": "blurry, low quality",
"seed": 42
}
Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
provider |
string | No | seedance (default) or veo (Lite, WaveSpeed) or veo-fast (DeepInfra) |
prompt |
string | Yes | Cinematic scene description. |
aspect_ratio |
string | No | 16:9 (default), 9:16, 1:1, 4:3, 3:4, 21:9. |
duration |
integer | No | 4–15 seconds (default 5). |
resolution |
string | No | 720p (default) or 1080p |
generate_audio |
boolean | No | Native synchronized audio (default: true). |
reference_images |
array | No | Seedance only — HTTPS URLs (doubles credit cost). |
negative_prompt |
string | No | Veo only — elements to exclude. |
seed |
integer | No | Veo only — reproducible generation. |
reference_videos |
array | No | Seedance only — reference video URLs (doubles credit cost). |
Response
{
"success": true,
"provider": "seedance-2.0-fast-turbo",
"video": "base64_encoded_mp4_bytes",
"duration": 5,
"resolution": "720p",
"credits_used": 13.0,
"credits_remaining": 87.0,
"video_size_bytes": 1234567
}
Credits: Seedance — 13/14 per 5s @ 720p/1080p (×2 with references). Veo 3.1 Lite — 7/11 per 6s @ 720p/1080p. Veo 3.1 Fast — 3.5 credits/s ($0.15/s upstream).
Voice memo (text only)
Create a voice memo from text only (no audio). The text is stored as a voice memo in your account and appears in the Voice notes list. Free — no credits charged.
Endpoint: POST /api/v1/voice-memo/text/
Request Body
{
"text": "Your note text here",
"title": "Optional title (default: first 50 chars of text)"
}
Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
text |
string | Yes | Text to save as voice memo (no audio generated; max 10000 characters) |
title |
string | No | Optional title; if omitted, first 50 characters of text are used |
Response
{
"success": true,
"voice_note_id": 123,
"title": "Your note text here",
"detail_url": "https://.../audio/voice-notes/123/",
"created_at": "2026-02-04T12:00:00Z"
}
Get Available Voices
Endpoint: GET /api/v1/voices/
Query Parameters
language: Filter voices by language code (optional)
Response
{
"success": true,
"voices": {
"en": [
{"id": "af_sarah", "name": "Sarah (Female, English)"},
{"id": "am_adam", "name": "Adam (Male, English)"}
],
"it": [
{"id": "if_sara", "name": "Sara (Female, Italian)"}
]
}
}
Check Status
Endpoint: GET /api/v1/status/
Response
{
"success": true,
"user": "username",
"credits": 100.0,
"api_key": {
"name": "My API Key",
"created_at": "2024-01-01T00:00:00Z",
"last_used": "2024-01-15T12:00:00Z",
"usage_count": 42
}
}
Code Examples
cURL - TTS
curl -X POST https://webvoice.easytaskflow.app/api/v1/tts/ \
-H "X-API-Key: YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"text": "Hello, world!",
"voice": "af_sarah",
"language": "en",
"speed": 0.90
}'
Python - TTS
import requests
import base64
api_key = "YOUR_API_KEY"
url = "https://webvoice.easytaskflow.app/api/v1/tts/"
response = requests.post(
url,
headers={"X-API-Key": api_key},
json={
"text": "Hello, world!",
"voice": "af_sarah",
"language": "en",
"speed": 0.90
}
)
data = response.json()
audio_data = base64.b64decode(data["audio"])
with open("output.mp3", "wb") as f:
f.write(audio_data)
Python - STT
import requests
api_key = "YOUR_API_KEY"
url = "https://webvoice.easytaskflow.app/api/v1/stt/"
with open("audio.mp3", "rb") as f:
response = requests.post(
url,
headers={"X-API-Key": api_key},
files={"audio": f},
data={"language": "it"}
)
data = response.json()
print(data["text"])
Python - Translation
import requests
api_key = "YOUR_API_KEY"
url = "https://webvoice.easytaskflow.app/api/v1/translation/"
response = requests.post(
url,
headers={
"X-API-Key": api_key,
"Content-Type": "application/json"
},
json={
"text": "Hello, world!",
"source_language": "en",
"target_language": "it"
}
)
data = response.json()
print(data["translated_text"])
cURL - Translation
curl -X POST https://webvoice.easytaskflow.app/api/v1/translation/ \
-H "X-API-Key: YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"text": "Hello, world!",
"source_language": "en",
"target_language": "it"
}'
cURL - Image generation
curl -X POST https://webvoice.easytaskflow.app/api/v1/image/ \
-H "X-API-Key: YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"prompt": "A watercolor landscape with mountains",
"width": 1024,
"height": 1024
}'
Python - Image generation
import requests
import base64
api_key = "YOUR_API_KEY"
url = "https://webvoice.easytaskflow.app/api/v1/image/"
response = requests.post(
url,
headers={"X-API-Key": api_key, "Content-Type": "application/json"},
json={
"prompt": "A red bicycle on a sunny street",
"width": 1024,
"height": 1024,
},
)
data = response.json()
raw = base64.b64decode(data["image"])
with open("out.png", "wb") as f:
f.write(raw)
Error Codes
| Status Code | Error | Description |
|---|---|---|
| 400 | Bad Request | Invalid request parameters |
| 401 | Unauthorized | Invalid or missing API key |
| 402 | Payment Required | Insufficient credits |
| 429 | Too Many Requests | Free chat model rate limit exceeded (OpenRouter-aligned RPM/RPD); Retry-After header set |
| 429 | Too Many Requests | Send-email rate limit exceeded; Retry-After header set |
| 500 | Internal Server Error | Server error occurred |
Error Response Format
{
"error": "Error type",
"message": "Detailed error message"
}
Credits
- TTS: 0.5 credits per minute
- STT: 0.3 credits per minute
- Translation: 0.1 credit per 1000 characters (minimum 0.1)
- Image generation: credits per image as configured on the server (IMAGE_GENERATION_CREDITS; default 7 credits per image)