Free chat models (0 credits)

These models are routed via OpenRouter with verified :free slugs. Via API they cost 0 credits per request — you can call them even with zero balance. Model ids: openrouter:openrouter/free, openrouter:openai/gpt-oss-20b:free, openrouter:openai/gpt-oss-120b:free, openrouter:google/gemma-4-31b-it:free. Groq-hosted similar models are billed normally.

Rate limits (OpenRouter-aligned, per account): 20 requests/minute; 50/day without credit purchases, 1000/day after at least one purchase. HTTP 429 with Retry-After when exceeded.

Model ID	Name	Cost
`openrouter:openrouter/free`	OpenRouter Free (auto)	Free
`openrouter:openai/gpt-oss-120b:free`	GPT OSS 120B (OpenRouter free)	Free
`openrouter:openai/gpt-oss-20b:free`	GPT OSS 20B (OpenRouter free)	Free
`openrouter:google/gemma-4-31b-it:free`	Google Gemma 4 31B (OpenRouter free)	Free

All public chat models

Models available for integration (subject to your account tier after sign-in). Use the model id in POST chat/completions/.

Model ID	Name	Provider	Credits / request
deepseek-v4-flash	DeepSeek V4 Flash	DeepSeek	2.0
openrouter:openrouter/free	OpenRouter Free (auto)	OpenRouter	Free
openrouter:allenai/olmo-3.1-32b-think	AllenAI: Olmo 3.1 32B Think	OpenRouter	2.0
deepseek-v4-pro	DeepSeek V4 Pro	DeepSeek	3.0
openrouter:openai/gpt-oss-120b:free	GPT OSS 120B (OpenRouter free)	OpenRouter	Free
deepseek-reasoner	DeepSeek Reasoner (V4 thinking)	DeepSeek	3.0
gemini	Google Gemini	Google Gemini	2.0
openrouter:google/gemini-2.5-flash	Google: Gemini 2.5 Flash	OpenRouter	2.0
openrouter:openai/gpt-oss-20b:free	GPT OSS 20B (OpenRouter free)	OpenRouter	Free
deepseek-websearch	DeepSeek WebSearch (V4 Flash)	DeepSeek	2.0
openrouter:google/gemini-2.5-flash-image	Google: Gemini 2.5 Flash Image (Nano Banana)	OpenRouter	2.0
qwen3_fast	Qwen3 Fast (Groq)	Groq	2.0
llama-3.1-8b-instant	Llama 3.1 8B Instant (Groq)	Groq	2.0
openrouter:google/gemma-4-31b-it:free	Google Gemma 4 31B (OpenRouter free)	OpenRouter	Free
openai/gpt-oss-safeguard-20b	GPT OSS Safeguard 20B (Groq)	Groq	2.0
moonshotai/kimi-k2-instruct-0905	Kimi K2 (Groq)	Groq	2.0
openai/gpt-oss-20b	GPT OSS 20B (Groq)	Groq	2.0
llama-3.3-70b-versatile	Llama 3.3 70B Versatile (Groq)	Groq	3.0
openai/gpt-oss-120b	GPT OSS 120B (Groq)	Groq	2.0
moonshotai/Kimi-K2.6	Kimi K2.6 (DeepInfra)	DeepInfra	3.0
nvidia/Nemotron-3-Nano-Omni-30B-A3B-Reasoning	Nemotron 3 Nano Omni 30B Reasoning (DeepInfra)	DeepInfra	2.0
Qwen/Qwen3-Max-Thinking	Qwen3 Max Thinking (DeepInfra)	DeepInfra	5.0
google/gemma-4-26B-A4B-it	Gemma 4 26B IT (DeepInfra)	DeepInfra	1.0
deepseek-chat	DeepSeek Chat (legacy alias)	DeepSeek	2.0
kimi-k2.5	Moonshot Kimi K2.5	Moonshot Kimi	2.0
glm-4.6	Z.AI GLM 4.6	Z.AI GLM	2.0
glm-4.7	Z.AI GLM 4.7	Z.AI GLM	2.0
glm-5	Z.AI GLM 5	Z.AI GLM	2.0
MiniMax-M2.7-highspeed	MiniMax M2.7 Highspeed (api.minimax.io)	MiniMax (api.minimax.io)	2.0

Authentication

TTS, STT, Translation, AI Chat, Image generation and Voices endpoints require an API key. You can create API keys from the API Dashboard.

Login with email code (no API key)

For web/mobile login without password: send email to receive a code, then verify the code. Rate limited per IP (no CAPTCHA).

POST https://webvoice.easytaskflow.app/api/v1/auth/send-code/ — JSON: email, accept_privacy, accept_terms. Response includes is_new_user.
POST https://webvoice.easytaskflow.app/api/v1/auth/verify-code/ — JSON: email, code, optional create_api_key, api_key_name. Stateless (no session cookie required). Returns api_key, onboarding (Solana memo + wallet).
GET https://webvoice.easytaskflow.app/api/v1/onboarding/ — Requires API key. Credits, can_use_api, optional Solana memo, PayPal URLs.
POST https://webvoice.easytaskflow.app/api/v1/keys/ — Requires API key. JSON {"name": "my-agent"} → new wv_… key.

Limits: 5 send-code and 20 verify-code requests per IP per 15 minutes.

Autonomous agent flow (register → use → optional top-up)

New accounts receive welcome credits (~20). If onboarding.can_use_api is true, start calling chat/TTS/STT immediately — Solana or PayPal are only needed when credits run out.

POST auth/send-code/ — agent email + accept terms
Read OTP from email (human or mailbox API)
POST auth/verify-code/ — create_api_key: true → wv_… api_key + onboarding (credits, can_use_api, optional solana.memo_code)
Configure MCP with WEBVOICE_API_KEY
If credits > 0: call webvoice_chat / webvoice_tts / … immediately
Optional top-up: Solana (onboarding.solana) or PayPal (onboarding.urls.buy_credits_paypal)

# 1 — send code
curl -X POST "https://webvoice.easytaskflow.app/api/v1/auth/send-code/" \\
  -H "Content-Type: application/json" \\
  -d '{"email":"agent@example.com","accept_privacy":true,"accept_terms":true}'

# 2 — verify + get API key (stateless; email in JSON, no session cookie)
curl -X POST "https://webvoice.easytaskflow.app/api/v1/auth/verify-code/" \\
  -H "Content-Type: application/json" \\
  -d '{"email":"agent@example.com","code":"123456","create_api_key":true,"api_key_name":"cursor-agent"}'

# Example onboarding fields in verify response:
# {
#   "api_key": "wv_…",
#   "onboarding": {
#     "credits": 20.0,
#     "can_use_api": true,
#     "billing": { "topup_required": false, "note": "…" },
#     "solana": { "available": true, "wallet": "…", "memo_code": "WV…" }
#   }
# }

Same flow via MCP: webvoice_register_send_code → webvoice_register_verify → webvoice_onboarding. See MCP section.

Using API Key

Include your API key in one of the following ways:

Header (Recommended)

X-API-Key: wv_your_api_key_here

Or use Authorization: Bearer wv_your_api_key_here. API keys in query strings (?api_key=) are not accepted — they would appear in server logs and browser history.

API Endpoints

Base URL: https://webvoice.easytaskflow.app/api/v1/

Production API endpoint: webvoice.easytaskflow.app

Endpoint	Method	Description
`/auth/send-code/`	POST	Request login code by email (rate limited, no API key)
`/auth/verify-code/`	POST	Verify code; optional create_api_key + onboarding (stateless JSON)
`/onboarding/`	GET	Agent onboarding: credits, can_use_api, optional Solana memo (API key required)
`/keys/`	POST	Create additional API key (JSON name)
`/tts/`	POST	Generate speech from text
`/stt/`	POST	Transcribe audio to text
`/translation/`	POST	Translate text between languages
`/chat/models/`	GET	List AI chat models (WebVoice extended fields)
`/models/`	GET	OpenAI-compatible model list
`/chat/completions/`	POST	OpenAI-compatible chat completions (optional SSE stream)
`/image/`	POST	Text-to-image (MiniMax); server must have MINIMAX_API_KEY
`/voices/`	GET	Get voices for a TTS provider and language (optional query: provider, language)
`/tts-catalog/`	GET	TTS/STT providers, languages, engine settings (API key)
`/tts-stats/`	GET	Voice/language counts summary (API key; same as public /audio/api/tts-stats/)
`/image-to-image/`	POST	FLUX.2 Flash Edit (image edit)
`/image-to-video/`	POST	Image-to-video (Seedance / Gen4 Turbo)
`/text-to-video/`	POST	Text-to-video (Seedance, Veo Lite/Fast)
`/voice-memo/text/`	POST	Create text-only voice note (API key)
`/push-token/`	POST	Register FCM/APNs push token (mobile app, API key)
`/accept-ai-policy/`	POST	Accept AI policy (mobile app, API key)
Session login (mobile app & browser UI) — cookie auth, not API key; base path `/audio/`
`/audio/api/tts-catalog/`	GET	TTS/STT provider list, languages, per-engine settings (speed, trim, MiniMax model)
`/audio/api/voices/`	GET	Voices for one provider + language (query: provider, language)
`/audio/profile-defaults/`	GET/POST	Default chat/TTS/STT models; includes same catalog fields as tts-catalog
`/audio/api/tts-stats/`	GET	Public voice/language counts (no provider list)
`/status/`	GET	Check account status and credits
`/send-email/`	POST	Send an email to a given address (text only, no attachments). Requires API key.

Send Email

Endpoint: POST /api/v1/send-email/

Sends an email from the site to the specified address. Text only, no attachments. Requires API key.

Rate limit: 10 emails per account per 1 hour(s). HTTP 429 with Retry-After when exceeded.

Request Body

{
  "to": "destinatario@example.com",
  "subject": "Oggetto (optional)",
  "body": "Testo dell'email"
}

Parameters

Parameter	Type	Required	Description
`to`	string	Yes	Recipient email address
`subject`	string	No	Email subject (default: Message from WebVoice API)
`body` / `text` / `content`	string	Yes	Email body (plain text, max 50000 characters)

Response

{
  "success": true,
  "message": "Email sent successfully",
  "to": "destinatario@example.com",
  "subject": "Oggetto"
}

Text-to-Speech (TTS)

Endpoint: POST /api/v1/tts/

Request Body

{
  "text": "Hello, world!",
  "voice": "af_sarah",
  "language": "en",
  "speed": 0.90
}

Parameters

Parameter	Type	Required	Description
`text`	string	Yes	Text to convert to speech
`voice`	string	Yes	Voice ID (e.g., 'af_sarah', 'if_sara')
`language`	string	No	Language code (default: 'en')
`speed`	float	No	Speech speed (0.5-2.0, default: 0.90)
`provider`	string	No	TTS engine: kokoro, kokoro_fast, google, qweb_local, inworld, inworld_mini, minimax, minimax_official (default: account / Kokoro)
`trim`	boolean	No	Trim silence (default true; Kokoro recommended)
`minimax_tts_model`	string	No	Only for minimax_official (e.g. speech-2.8-hd). See TTS catalog for choices.

List engines with GET /api/v1/tts-catalog/ or GET /audio/api/tts-catalog/ before calling voices or TTS.

Response

{
  "success": true,
  "audio": "base64_encoded_mp3_data",
  "format": "mp3",
  "duration": 2.5,
  "credits_used": 0.5,
  "credits_remaining": 99.5
}

Speech-to-Text (STT)

Endpoint: POST /api/v1/stt/

Request

Send as multipart/form-data:

audio: Audio file (MP3, WAV, M4A, FLAC)
language: Optional. Omit or use empty/auto for automatic language detection (Whisper). Or specify code (it, en, es, fr, de, pt, ru, zh, ja, ko, etc.)
provider: Optional. whisper_small (default from account), whisper_fast (DeepInfra Turbo), whisper_groq (alias), whisper_max_local (Modal). Requires DEEPINFRA_API_KEY for whisper_fast.

Response

The 'language' field returns the auto-detected or specified language code.

{
  "success": true,
  "text": "Transcribed text here",
  "language": "it",
  "duration": 10.5,
  "credits_used": 0.3,
  "credits_remaining": 99.2
}

Translation

Endpoint: POST /api/v1/translation/

Request Body

{
  "text": "Hello, world!",
  "source_language": "en",
  "target_language": "it"
}

Parameters

Parameter	Type	Required	Description
`text`	string	Yes	Text to translate
`source_language`	string	No	Source language code (default: 'auto', which defaults to 'en')
`target_language`	string	Yes	Target language code (e.g., 'it', 'en', 'es', 'fr', 'de', 'pt', 'ru', 'zh', 'ja', 'ko', 'ar', 'nl', 'pl', 'tr', 'cs')

Supported Languages

The following language codes are supported:

en - English
it - Italian
es - Spanish
fr - French
de - German
pt - Portuguese
ru - Russian
zh - Chinese
ja - Japanese
ko - Korean
ar - Arabic
nl - Dutch
pl - Polish
tr - Turkish
cs - Czech

Response

{
  "success": true,
  "translated_text": "Ciao, mondo!",
  "source_language": "en",
  "target_language": "it",
  "text_length": 13,
  "credits_used": 0.1,
  "credits_remaining": 99.9
}

AI Chat (OpenAI-compatible bridge)

Use WebVoice as a drop-in OpenAI API base URL: authenticate with your WebVoice API key (X-API-Key or Authorization: Bearer), choose a managed model, and credits are billed on your profile. Responses follow the OpenAI chat.completions schema; credit info is in the optional webvoice field.

OpenAI base URL

https://webvoice.easytaskflow.app/api/v1/

Example: POST chat/completions/, GET models/, compatible with OpenAI SDK when you set base_url and api_key.

List models

WebVoice extended: GET /api/v1/chat/models/ — includes credits_per_request and display_name.

OpenAI-compatible: GET /api/v1/models/

{
  "object": "list",
  "data": [
    {
      "id": "deepseek-v4-flash",
      "object": "model",
      "created": 1700000000,
      "owned_by": "deepseek"
    }
  ]
}

Chat completion

Endpoint: POST /api/v1/chat/completions/

Same request body as OpenAI: model, messages, max_tokens, temperature, stream. Set stream: true for Server-Sent Events (text/event-stream).

DeepSeek routes: deepseek-v4-flash (default, fast), deepseek-v4-pro (frontier), deepseek-reasoner (V4 thinking mode). Legacy deepseek-chat still works and maps to V4 Flash.

Request body (JSON)

{
  "model": "deepseek-v4-flash",
  "messages": [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Summarize WebVoice in one sentence."}
  ],
  "max_tokens": 2000,
  "temperature": 0.7,
  "stream": false,
  "web_search": false
}

Parameters

Parameter	Type	Required	Description
`model`	string	Yes	Model id from GET /models/ or /chat/models/
`provider`	string	No	Provider key (e.g. groq, openrouter, deepseek). Required when the same model id exists on multiple providers.
`messages`	array	Yes	OpenAI message list. Standard chat: system\|user\|assistant (last role user). With tools: also assistant (with tool_calls) and tool (last role user or tool).
`tools`	array	No	OpenAI function tools array. Supported on DeepSeek and OpenRouter models only.
`tool_choice`	string\|object	No	none, auto (default when tools set), required, or {"type":"function","function":{"name":"…"}}
`stream`	boolean	No	If true, SSE stream of chat.completion.chunk events ending with data: [DONE]
`max_tokens`	integer	No	Max response tokens (1–8000, default 2000)
`temperature`	number	No	Sampling temperature 0–2 (default 0.7)
`web_search`	boolean	No	WebVoice extension: web search where supported (DeepSeek WebSearch, Moonshot, Z.AI)

Response (non-streaming)

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1700000000,
  "model": "deepseek-v4-flash",
  "choices": [{
    "index": 0,
    "message": {
      "role": "assistant",
      "content": "WebVoice is a voice and AI platform for TTS, STT, chat and memos."
    },
    "finish_reason": "stop"
  }],
  "usage": {
    "prompt_tokens": 42,
    "completion_tokens": 18,
    "total_tokens": 60
  },
  "webvoice": {
    "credits_used": 2.0,
    "credits_remaining": 98.0,
    "provider": "deepseek"
  }
}

Response (streaming SSE)

data: {"id":"chatcmpl-...","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"role":"assistant","content":""},"finish_reason":null}]}

data: {"id":"chatcmpl-...","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":"Hello"},"finish_reason":null}]}

data: {"id":"chatcmpl-...","object":"chat.completion.chunk","choices":[{"index":0,"delta":{},"finish_reason":"stop"}],"usage":{"total_tokens":60},"webvoice":{"credits_used":2.0,"credits_remaining":98.0}}

data: [DONE]

Example (curl)

# Non-streaming
curl -X POST "https://webvoice.easytaskflow.app/api/v1/chat/completions/" \
  -H "Authorization: Bearer wv_your_api_key_here" \
  -H "Content-Type: application/json" \
  -d '{"model":"deepseek-v4-flash","messages":[{"role":"user","content":"Hello"}]}'

# Streaming
curl -N -X POST "https://webvoice.easytaskflow.app/api/v1/chat/completions/" \
  -H "X-API-Key: wv_your_api_key_here" \
  -H "Content-Type: application/json" \
  -d '{"model":"deepseek-v4-flash","stream":true,"messages":[{"role":"user","content":"Hello"}]}'

Example (OpenAI Python SDK)

from openai import OpenAI

client = OpenAI(
    api_key="wv_your_api_key_here",
    base_url="https://webvoice.easytaskflow.app/api/v1",
)

# Non-streaming
r = client.chat.completions.create(
    model="deepseek-v4-flash",
    messages=[{"role": "user", "content": "Hello"}],
)
print(r.choices[0].message.content)

# Streaming
stream = client.chat.completions.create(
    model="deepseek-v4-flash",
    messages=[{"role": "user", "content": "Hello"}],
    stream=True,
)
for event in stream:
    if event.choices[0].delta.content:
        print(event.choices[0].delta.content, end="")

Agent integration — tools / function calling

Build agents that call your own functions: send tools and tool_choice in the chat completion request. When the model wants to run a function, the response contains tool_calls instead of (or in addition to) text. Your code executes the function locally, appends role tool messages, and calls the API again until finish_reason is stop.

Supported providers: DeepSeek (e.g. deepseek-v4-flash) and OpenRouter models. Not available on Groq/Gemini free-tier routes. web_search and tools cannot be combined in one request.

Step 1 — define tools

{
  "model": "deepseek-v4-flash",
  "tool_choice": "auto",
  "tools": [{
    "type": "function",
    "function": {
      "name": "get_weather",
      "description": "Get current weather for a city",
      "parameters": {
        "type": "object",
        "properties": {
          "city": {"type": "string", "description": "City name"}
        },
        "required": ["city"]
      }
    }
  }],
  "messages": [
    {"role": "user", "content": "What is the weather in Rome?"}
  ]
}

Step 2 — model returns tool_calls

{
  "choices": [{
    "message": {
      "role": "assistant",
      "content": null,
      "tool_calls": [{
        "id": "call_abc123",
        "type": "function",
        "function": {
          "name": "get_weather",
          "arguments": "{\"city\":\"Rome\"}"
        }
      }]
    },
    "finish_reason": "tool_calls"
  }],
  "webvoice": {"credits_used": 2.0, "credits_remaining": 98.0}
}

Step 3 — execute locally and call again

Run get_weather("Rome") in your app, then POST again with the full message history including the assistant tool_calls message and your tool result:

{
  "model": "deepseek-v4-flash",
  "tools": [ ... same tools ... ],
  "messages": [
    {"role": "user", "content": "What is the weather in Rome?"},
    {
      "role": "assistant",
      "content": null,
      "tool_calls": [{
        "id": "call_abc123",
        "type": "function",
        "function": {"name": "get_weather", "arguments": "{\"city\":\"Rome\"}"}
      }]
    },
    {
      "role": "tool",
      "tool_call_id": "call_abc123",
      "content": "{\"temp_c\": 22, \"condition\": \"sunny\"}"
    }
  ]
}

Agent loop (pseudocode)

messages = [{"role": "user", "content": user_prompt}]
while True:
    resp = client.chat.completions.create(
        model="deepseek-v4-flash",
        messages=messages,
        tools=TOOLS,
        tool_choice="auto",
    )
    msg = resp.choices[0].message
    if resp.choices[0].finish_reason == "tool_calls":
        messages.append(msg.model_dump())
        for tc in msg.tool_calls:
            result = run_function(tc.function.name, json.loads(tc.function.arguments))
            messages.append({
                "role": "tool",
                "tool_call_id": tc.id,
                "content": json.dumps(result),
            })
        continue
    return msg.content

Streaming with tools

Set stream: true. Chunks may include delta.tool_calls (partial). The final chunk has finish_reason tool_calls or stop. Credits are billed once per completed API request (same as non-streaming).

MCP server — use WebVoice in Cursor

Install the local stdio MCP server (webvoice-mcp) to expose WebVoice as tools in Cursor, Claude Desktop, or any MCP client. Calls are forwarded to this REST API.

Option A — Agent self-registration (no browser)

Add MCP to Cursor (config below); WEBVOICE_API_KEY can be empty for the first step.
webvoice_register_send_code — email + accept terms
webvoice_register_verify — OTP → api_key (once) + onboarding
Set WEBVOICE_API_KEY in mcp.json and restart Cursor.
If onboarding.can_use_api is true, use chat/TTS/STT right away with welcome credits.
Optional: top up via Solana (webvoice_onboarding → solana.memo_code) or PayPal URLs in onboarding.urls.

REST equivalent: Agent registration (auth/send-code, auth/verify-code).

Option B — Browser signup (human)

Sign up / log in — Login with email code or Google.
Create an API key — API dashboard → wv_…
Configure MCP — WEBVOICE_API_KEY in Cursor (below).

Credits are shared across web, API, and MCP. When balance is zero, calls return HTTP 402 and we email a recharge link. Solana and PayPal top-ups are optional — not required if you already have credits.

Install

pip install webvoice-mcp
# or from source:
pip install -r requirements-mcp.txt
pip install -e .

PyPI package + MCP Registry: io.github.easytaskflow/webvoice-mcp — see docs/MCP_DISTRIBUTION.md in the repo.

Cursor configuration

Add to ~/.cursor/mcp.json (Settings → MCP):

{
  "mcpServers": {
    "webvoice": {
      "command": "webvoice-mcp",
      "env": {
        "WEBVOICE_API_KEY": "wv_your_api_key_here"
      }
    }
  }
}

Optional env: WEBVOICE_BASE_URL (default https://webvoice.easytaskflow.app/api/v1).

MCP tools

Tool	Maps to / purpose
`webvoice_register_send_code`	`POST auth/send-code/` — no API key
`webvoice_register_verify`	`POST auth/verify-code/` — api_key + onboarding
`webvoice_onboarding`	`GET onboarding/` — credits, can_use_api, Solana memo
`webvoice_status`	`GET status/`
`webvoice_list_chat_models`	`GET chat/models/`
`webvoice_list_tts_catalog`	`GET tts-catalog/` — TTS/STT engines + settings
`webvoice_list_voices`	`GET voices/` — optional provider + language (use after catalog)
`webvoice_chat`	`POST chat/completions/` — DeepSeek, Groq, OpenRouter, DeepInfra
`webvoice_deepinfra_chat`	`POST chat/completions/` — Kimi K2.6, Nemotron, Qwen3 Max Thinking, Gemma 4 IT
`webvoice_tts`	`POST tts/` — set output_path to save MP3
`webvoice_stt`	`POST stt/` — local audio_path; optional provider
`webvoice_whisper_fast_stt`	`POST stt/` — DeepInfra whisper-large-v3-turbo
`webvoice_translate`	`POST translation/`
`webvoice_image`	`POST image/` — MiniMax, WAN, Gen4, SDXL, qwen-image-max, …
`webvoice_qwen_image_max`	`POST image/` — Qwen-Image-Max via DeepInfra
`webvoice_image_to_image`	`POST image-to-image/` — FLUX.2 Flash Edit
`webvoice_image_to_video`	`POST image-to-video/` — Seedance / Gen4 Turbo
`webvoice_text_to_video`	`POST text-to-video/` — Seedance, Veo Lite, Veo Fast
`webvoice_veo_fast_text_to_video`	`POST text-to-video/` — Google Veo 3.1 Fast (DeepInfra)
`webvoice_account_links`	Dashboard / billing URLs

See also webvoice_mcp/README.md in the repository.

Image generation (text-to-image)

Default provider is MiniMax (MINIMAX_API_KEY). WaveSpeed providers (WAVESPEED_API_KEY): wan, nucleus, gen4, gen4-turbo, sd3, sdxl, flux-klein-lora. DeepInfra (DEEPINFRA_API_KEY): qwen-image-max. Missing key → HTTP 503.

Endpoint: POST /api/v1/image/

Upstream docs: MiniMax · WAN 2.7 · Nucleus · Gen4 · SDXL · FLUX Klein LoRA · Qwen-Image-Max (DeepInfra)

Request body (JSON)

{
  "prompt": "A red bicycle on a sunny street",
  "provider": "minimax",
  "width": 1024,
  "height": 1024,
  "seed": 42,
  "model": "image-01",
  "aspect_ratio": "16:9",
  "response_format": "base64",
  "provider": "nucleus",
  "negative_prompt": "blurry, low quality",
  "num_images": 1,
  "num_inference_steps": 50,
  "guidance_scale": 8,
  "output_format": "png",
  "provider": "gen4",
  "resolution": "1080p",
  "reference_images": ["https://example.com/ref1.jpg"],
  "seed": 42
}

Runway Gen4 Image Turbo example (faster, ~$0.03 upstream):

{
  "provider": "gen4-turbo",
  "prompt": "Character wearing a leather jacket in a cyberpunk city",
  "aspect_ratio": "9:16",
  "resolution": "1080p",
  "reference_images": ["https://example.com/character-ref.jpg"]
}

Stable Diffusion 3 example (~$0.03 upstream, optional img2img):

{
  "provider": "sd3",
  "prompt": "Marina Bay at sunset, vivid purple and orange afterglow, long exposure",
  "aspect_ratio": "16:9",
  "image_url": "https://example.com/style-ref.jpg",
  "seed": -1
}

Stability AI SDXL example (~$0.0026 upstream):

{
  "provider": "sdxl",
  "prompt": "Close-up food photography of a gourmet cheeseburger, shallow depth of field",
  "aspect_ratio": "1:1",
  "size": "1024*1024",
  "seed": -1
}

Qwen-Image-Max via DeepInfra example (~$0.075/image upstream, photorealistic):

{
  "provider": "qwen-image-max",
  "prompt": "Portrait of a woman in natural light, lifelike skin texture, shallow depth of field",
  "aspect_ratio": "3:4",
  "size": "1024x1536"
}

FLUX.2 Klein 9B LoRA example (~$0.02 upstream):

{
  "provider": "flux-klein-lora",
  "prompt": "Cinematic romantic drama still, rooftop at sunset, film photography",
  "aspect_ratio": "16:9",
  "size": "1024*1024",
  "loras": [
    {"path": "https://example.com/my-style-lora.safetensors", "scale": 1.0}
  ],
  "seed": -1
}

Parameters

Parameter	Type	Required	Description
`provider`	string	No	`minimax` (default), `wan`, `nucleus`, `gen4`, `gen4-turbo`, `sd3`, `sdxl`, or `flux-klein-lora`
`prompt`	string	Yes	Text description of the image (up to 4000 characters, truncated server-side)
`width` / `height`	integer	No	Optional pixel size; each clamped between 512 and 2048. For MiniMax, used to pick the closest aspect_ratio. For WAN, sent as width×height (e.g. 1024*1024).
`seed`	integer	No	Optional; reproducible generation when supported by the provider.
`model`	string	No	MiniMax only — image model (e.g. image-01). Unknown values fall back to the server default.
`aspect_ratio`	string	No	e.g. 1:1, 16:9, 9:16 — supported by all providers when width/height are omitted.
`thinking_mode`	boolean	No	WAN only — enhanced reasoning for better quality (default: true).
`negative_prompt`	string	No	Nucleus only — elements to exclude from the image.
`num_images`	integer	No	Nucleus only — 1 (default) or 2 images per request. Billed per image.
`num_inference_steps`	integer	No	Nucleus only — 1–100, default 50.
`guidance_scale`	number	No	Nucleus only — classifier-free guidance, 0–20, default 8.
`output_format`	string	No	Nucleus only — png (default) or jpeg.
`resolution`	string	No	Gen4 / Gen4 Turbo — 720p or 1080p (default 1080p).
`reference_images` / `reference_image_urls`	array / string	No	Gen4 / Gen4 Turbo — up to 3 public HTTPS URLs to guide style or subject.
`size`	string	No	SDXL / WAN — canvas size e.g. 1024*1024 (256–1536 px per side for SDXL).
`loras` / `lora_url`	array / string	No	FLUX Klein LoRA — list of {path, scale?} objects, or single lora_url + lora_scale.
`image_url`	string	No	SD3 / SDXL — optional reference image URL for image-to-image.
`response_format`	string	No	`base64` (default) or `url` — how MiniMax returns image data; the API always responds with base64-encoded image bytes in JSON.

Response

Binary images are returned as Base64 strings (typically PNG or JPEG depending on the provider).

{
  "success": true,
  "provider": "wan",
  "image": "base64_encoded_image_bytes",
  "images": ["base64_1", "base64_2"],
  "count": 1,
  "credits_used": 1.0,
  "credits_remaining": 99.0
}

Credits per image: MiniMax 7, WAN 8, Nucleus 4 (× num_images), Gen4 2/3, Gen4 Turbo 1, SD3 1, SDXL 1, FLUX Klein LoRA 1.

Image to video

Generates video from a reference image. Default provider is Seedance 2.0 Turbo (720p/1080p, optional audio); use provider gen4 for Runway Gen4 Turbo (WAVESPEED_API_KEY).

Endpoint: POST /api/v1/image-to-video/

Upstream docs: Seedance 2.0 I2V Turbo · Runway Gen4 Turbo

Request body (JSON or multipart)

{
  "provider": "seedance",
  "prompt": "Slow camera pan, golden hour light, gentle wind in the trees",
  "image_url": "https://example.com/reference.jpg",
  "duration": 5,
  "resolution": "720p",
  "aspect_ratio": "adaptive",
  "generate_audio": true
}

Runway Gen4 Turbo example:

{
  "provider": "gen4",
  "prompt": "The subject turns slowly toward the camera, soft wind in their hair",
  "image_url": "https://example.com/reference.jpg",
  "duration": 5,
  "aspect_ratio": "16:9"
}

Alternatively send multipart/form-data with fields prompt, duration, resolution, aspect_ratio, generate_audio and file field image.

Or pass reference image as base64 in JSON field image (instead of image_url).

Parameters

Parameter	Type	Required	Description
`provider`	string	No	`seedance` (default) or `gen4` / `runway`
`prompt`	string	Yes	Motion, camera, lighting, mood.
`image_url` / `image`	string	Yes	Public HTTPS URL or base64-encoded image bytes.
`duration`	integer	No	4–15 seconds for Seedance (default 5); 2–10 for Gen4 (default 5).
`resolution`	string	No	`720p` (default) or `1080p` (Seedance only)
`aspect_ratio`	string	No	Seedance: 16:9, 9:16, 1:1, 4:3, 3:4, 21:9, adaptive (default). Gen4: 16:9, 9:16, 1:1, 4:3, 3:4 or omit to match source.
`generate_audio`	boolean	No	Native synchronized audio (Seedance only, default: true).
`last_image_url`	string	No	Optional last-frame image for continuation.

Response

{
  "success": true,
  "provider": "seedance-2.0-turbo",
  "video": "base64_encoded_mp4_bytes",
  "duration": 5,
  "resolution": "720p",
  "credits_used": 15.0,
  "credits_remaining": 85.0,
  "video_size_bytes": 1234567
}

Credits: Seedance — 15/16 per 5s @ 720p/1080p (× duration). Gen4 Turbo — 2 per 5s (~$0.05 upstream).

Image to image

Edits 1–4 input images with FLUX.2 Flash Edit — restyle scenes, replace backgrounds, inpaint/outpaint, and retouch with plain-English prompts (WAVESPEED_API_KEY).

Endpoint: POST /api/v1/image-to-image/

Upstream docs: FLUX.2 Flash Edit

Request body (JSON or multipart)

{
  "provider": "flux-flash-edit",
  "prompt": "Replace the background with a modern office, keep the product centered",
  "images": [
    "https://example.com/product.jpg"
  ],
  "aspect_ratio": "1:1",
  "seed": 42
}

Multiple input images (up to 4):

{
  "prompt": "Merge styles: subject from first image, lighting from second",
  "image_urls": "https://example.com/a.jpg,https://example.com/b.jpg"
}

Alternatively send multipart/form-data with fields prompt, aspect_ratio, size, seed and file field images (1–4 files) or single image.

Or pass input images as base64 in JSON field image or array images.

Parameters

Parameter	Type	Required	Description
`provider`	string	No	`flux-flash-edit` (default)
`prompt`	string	Yes	Edit instructions in plain English (HEX colors supported).
`images` / `image_url` / `image`	array / string	Yes	1–4 public HTTPS URLs or base64-encoded images.
`aspect_ratio`	string	No	1:1, 16:9, 9:16, 4:3, 3:4 (default 1024×1024).
`size`	string	No	Custom widthheight (256–1536), e.g. 1024768.
`seed`	integer	No	Optional reproducibility seed.

Response

{
  "success": true,
  "provider": "flux-2-flash-edit",
  "image": "base64_encoded_png_or_jpeg_bytes",
  "input_count": 1,
  "credits_used": 1.0,
  "credits_remaining": 99.0,
  "image_size_bytes": 456789
}

Credits: FLUX.2 Flash Edit — 1 credit per edit (~$0.013 upstream).

Text to video

Generates cinematic 720p/1080p video from a text prompt. Default provider is Seedance 2.0 Fast Turbo (WAVESPEED_API_KEY); use provider veo for Google Veo 3.1 Lite (WaveSpeed) or veo-fast for Google Veo 3.1 Fast (DEEPINFRA_API_KEY, $0.15/s upstream).

Endpoint: POST /api/v1/text-to-video/

Upstream docs: Seedance 2.0 Fast T2V Turbo · Google Veo 3.1 Lite · Google Veo 3.1 Fast (DeepInfra)

Request body (JSON)

{
  "provider": "veo",
  "prompt": "Tracking shot through a neon-lit alley at night, rain reflections",
  "aspect_ratio": "16:9",
  "resolution": "720p",
  "duration": 6,
  "generate_audio": true,
  "negative_prompt": "blurry, low quality",
  "seed": 42
}

Parameters

Parameter	Type	Required	Description
`provider`	string	No	`seedance` (default) or `veo` (Lite, WaveSpeed) or `veo-fast` (DeepInfra)
`prompt`	string	Yes	Cinematic scene description.
`aspect_ratio`	string	No	16:9 (default), 9:16, 1:1, 4:3, 3:4, 21:9.
`duration`	integer	No	4–15 seconds (default 5).
`resolution`	string	No	`720p` (default) or `1080p`
`generate_audio`	boolean	No	Native synchronized audio (default: true).
`reference_images`	array	No	Seedance only — HTTPS URLs (doubles credit cost).
`negative_prompt`	string	No	Veo only — elements to exclude.
`seed`	integer	No	Veo only — reproducible generation.
`reference_videos`	array	No	Seedance only — reference video URLs (doubles credit cost).

Response

{
  "success": true,
  "provider": "seedance-2.0-fast-turbo",
  "video": "base64_encoded_mp4_bytes",
  "duration": 5,
  "resolution": "720p",
  "credits_used": 13.0,
  "credits_remaining": 87.0,
  "video_size_bytes": 1234567
}

Credits: Seedance — 13/14 per 5s @ 720p/1080p (×2 with references). Veo 3.1 Lite — 7/11 per 6s @ 720p/1080p. Veo 3.1 Fast — 3.5 credits/s ($0.15/s upstream).

Voice memo (text only)

Create a voice memo from text only (no audio). The text is stored as a voice memo in your account and appears in the Voice notes list. Free — no credits charged.

Endpoint: POST /api/v1/voice-memo/text/

Request Body

{
  "text": "Your note text here",
  "title": "Optional title (default: first 50 chars of text)"
}

Parameters

Parameter	Type	Required	Description
`text`	string	Yes	Text to save as voice memo (no audio generated; max 10000 characters)
`title`	string	No	Optional title; if omitted, first 50 characters of text are used

Response

{
  "success": true,
  "voice_note_id": 123,
  "title": "Your note text here",
  "detail_url": "https://.../audio/voice-notes/123/",
  "created_at": "2026-02-04T12:00:00Z"
}

Get Available Voices

Endpoint: GET /api/v1/voices/

Returns voice IDs for synthesis. To discover which TTS engines exist, use the catalog endpoints in the next sections — do not infer providers from a single voice list.

Query Parameters

language: Language code (optional). When set with provider, returns a flat list for that locale.
provider: TTS engine (optional): kokoro, kokoro_fast, google, qweb_local, inworld, inworld_mini, minimax, minimax_official

Response (with language=it&provider=kokoro)

{
  "success": true,
  "voices": [
    {"id": "if_sara", "name": "Sara (Female, Italian)"},
    {"id": "im_nicola", "name": "Nicola (Male, Italian)"}
  ]
}

Response (no filters — Kokoro voices grouped by language)

{
  "success": true,
  "voices": {
    "en": [
      {"id": "af_sarah", "name": "Sarah (Female, English)"}
    ],
    "it": [
      {"id": "if_sara", "name": "Sara (Female, Italian)"}
    ]
  }
}

TTS & STT catalog

REST (API key): GET /api/v1/tts-catalog/

Session (mobile / browser): GET /audio/api/tts-catalog/

Lists all TTS and STT engines, supported TTS languages, recommended defaults, and per-engine settings (e.g. speed range, trim for Kokoro, MiniMax model). Same JSON payload on both endpoints; REST requires X-API-Key, session requires login cookie.

The mobile app loads motori and lingue from this endpoint (or from GET /audio/profile-defaults/, which embeds the same catalog).

TTS providers returned

kokoro — local Kokoro (1 credit / 10 min)
kokoro_fast — Kokoro via DeepInfra
google, qweb_local, inworld, inworld_mini, minimax, minimax_official

Example response (abbreviated)

{
  "tts_providers": [
    {
      "id": "kokoro",
      "name": "Kokoro (locale)",
      "cost": "1 credito / 10 min",
      "description": "...",
      "settings": {
        "fields": [
          {"key": "speed", "type": "range", "min": 0.5, "max": 2.0, "default": 0.90},
          {"key": "trim", "type": "boolean", "default": true}
        ],
        "language_notes": {
          "it": "Per l'italiano il server passa lang=it (pronuncia migliorata). Velocità consigliata 0.90–0.95."
        },
        "recommended_by_language": {"it": {"speed": 0.90, "trim": true}}
      }
    }
  ],
  "stt_providers": [
    {"id": "whisper_small", "name": "Whisper Small (locale)"}
  ],
  "tts_languages": [
    {"id": "it", "name": "Italian", "supports_lang_parameter": true, "recommended_speed": 0.90}
  ],
  "generation_defaults": {"speed": 0.90, "trim": true}
}

After choosing a provider from tts_providers, fetch voices with GET /api/v1/voices/?provider=<id>&language=<code> (API key) or GET /audio/api/voices/?provider=<id>&language=<code> (session).

Mobile & web session API

Cookie session auth (email/Google login), not API key. Used by the smartphone app and logged-in web UI. CSRF token: GET /api/csrf-token/ before POST forms.

Auth & utilities

Endpoint	Method	Purpose
`/api/v1/auth/send-code/`	POST	Login OTP (JSON, no CSRF)
`/api/v1/auth/verify-code/`	POST	Verify OTP → session cookie
`/accounts/google/token/`	POST	Google Sign-In token → session
`/api/csrf-token/`	GET	Fresh CSRF token for form POSTs

TTS / STT catalog & synthesis

Endpoint	Method	Purpose
`/audio/api/tts-catalog/`	GET	TTS/STT providers, languages, engine settings
`/audio/api/voices/`	GET	Voices for provider + language
`/audio/profile-defaults/`	GET/POST	User defaults + embedded catalog
`/audio/api/tts-stats/`	GET	Public voice/language counts (no provider list)
`/audio/tts/`	POST	Generate TTS (form: text, provider, voice, speed, trim)
`/audio/tts/history/api/`	GET	TTS generation history (JSON)
`/audio/tts/play/<id>/`	GET	Play saved TTS MP3
`/audio/tts/test/`	POST	Free voice test phrase (no credits)
`/audio/tts/test/play/`	GET	Pre-generated test MP3 for provider/voice/lang
`/audio/tts/job/submit/`	POST	Async TTS job (long text)
`/audio/tts/job/<uuid>/`	GET	Async TTS job status
`/audio/stt/`	POST	Transcribe audio upload
`/audio/stt/history/api/`	GET	STT history (JSON)

Voice notes

Endpoint	Method	Purpose
`/audio/voice-notes/api/`	GET	List voice notes
`/audio/voice-notes/save/`	POST	Upload / save voice memo
`/audio/voice-notes/<id>/api/`	GET	Voice note detail
`/audio/voice-notes/settings/`	GET/POST	Default memo language
`/audio/voice-notes/categories/api/`	GET	Memo categories
`/audio/voice-notes/categories/create/`	POST	Create category
`/audio/voice-notes/<id>/language/`	POST	Update note language
`/audio/voice-notes/<id>/play/`	GET	Play memo audio

Chat, billing, tones

Endpoint	Method	Purpose
`/admin-panel/chat/api/`	GET	Chat list
`/admin-panel/chat/<id>/api/`	GET/POST	Messages + send (SSE stream on POST)
`/admin-panel/chat/<id>/settings/`	GET	Chat settings + available_tts_providers
`/admin-panel/chat/<id>/rename/`	POST	Rename chat
`/admin-panel/chat/audio/<msg_id>/`	GET	Play message TTS audio
`/admin-panel/chat/genera-memo/<msg_id>/`	POST	Create voice note from chat message
`/chat-undo/<source>/<msg_id>/info/`	GET	Undo eligibility + credits
`/chat-undo/<source>/<msg_id>/undo/`	POST	Undo last chat operation
`/chat-undo/<source>/<msg_id>/refund/`	POST	Request credit refund
`/billing/api/balance/`	GET	Credit balance
`/billing/api/daily-credits/`	GET	Claim daily free credits
`/billing/credits/api/`	GET	Transaction history
`/billing/api/verify-android-purchase/`	POST	Google Play IAP verification
`/toni-agenti/api/`	GET	Conversation tones / agents list
`/admin-panel/errors/api/`	POST	Client error report (mobile)

REST equivalents with API key: see sections above for TTS, STT, chat/completions, voices, etc. Session routes above are what the Android/iOS app calls today.

Check Status

Endpoint: GET /api/v1/status/

Response

{
  "success": true,
  "user": "username",
  "credits": 100.0,
  "api_key": {
    "name": "My API Key",
    "created_at": "2024-01-01T00:00:00Z",
    "last_used": "2024-01-15T12:00:00Z",
    "usage_count": 42
  }
}

Code Examples

cURL - TTS

curl -X POST https://webvoice.easytaskflow.app/api/v1/tts/ \
  -H "X-API-Key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "text": "Hello, world!",
    "voice": "af_sarah",
    "language": "en",
    "speed": 0.90
  }'

Python - TTS

import requests
import base64

api_key = "YOUR_API_KEY"
url = "https://webvoice.easytaskflow.app/api/v1/tts/"

response = requests.post(
    url,
    headers={"X-API-Key": api_key},
    json={
        "text": "Hello, world!",
        "voice": "af_sarah",
        "language": "en",
        "speed": 0.90
    }
)

data = response.json()
audio_data = base64.b64decode(data["audio"])

with open("output.mp3", "wb") as f:
    f.write(audio_data)

Python - STT

import requests

api_key = "YOUR_API_KEY"
url = "https://webvoice.easytaskflow.app/api/v1/stt/"

with open("audio.mp3", "rb") as f:
    response = requests.post(
        url,
        headers={"X-API-Key": api_key},
        files={"audio": f},
        data={"language": "it"}
    )

data = response.json()
print(data["text"])

Python - Translation

import requests

api_key = "YOUR_API_KEY"
url = "https://webvoice.easytaskflow.app/api/v1/translation/"

response = requests.post(
    url,
    headers={
        "X-API-Key": api_key,
        "Content-Type": "application/json"
    },
    json={
        "text": "Hello, world!",
        "source_language": "en",
        "target_language": "it"
    }
)

data = response.json()
print(data["translated_text"])

cURL - Translation

curl -X POST https://webvoice.easytaskflow.app/api/v1/translation/ \
  -H "X-API-Key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "text": "Hello, world!",
    "source_language": "en",
    "target_language": "it"
  }'

cURL - Image generation

curl -X POST https://webvoice.easytaskflow.app/api/v1/image/ \
  -H "X-API-Key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "A watercolor landscape with mountains",
    "width": 1024,
    "height": 1024
  }'

Python - Image generation

import requests
import base64

api_key = "YOUR_API_KEY"
url = "https://webvoice.easytaskflow.app/api/v1/image/"

response = requests.post(
    url,
    headers={"X-API-Key": api_key, "Content-Type": "application/json"},
    json={
        "prompt": "A red bicycle on a sunny street",
        "width": 1024,
        "height": 1024,
    },
)
data = response.json()
raw = base64.b64decode(data["image"])
with open("out.png", "wb") as f:
    f.write(raw)

Error Codes

Status Code	Error	Description
400	Bad Request	Invalid request parameters
401	Unauthorized	Invalid or missing API key
402	Payment Required	Insufficient credits
429	Too Many Requests	Free chat model rate limit exceeded (OpenRouter-aligned RPM/RPD); Retry-After header set
429	Too Many Requests	Send-email rate limit exceeded; Retry-After header set
500	Internal Server Error	Server error occurred

Error Response Format

{
  "error": "Error type",
  "message": "Detailed error message"
}

Credits

TTS: 0.5 credits per minute
STT: 0.3 credits per minute
Translation: 0.1 credit per 1000 characters (minimum 0.1)
Image generation: credits per image as configured on the server (IMAGE_GENERATION_CREDITS; default 7 credits per image)

Quick Navigation

WebVoice API Documentation

Overview — API forward

Free chat models (0 credits)

All public chat models

Authentication

Login with email code (no API key)

Autonomous agent flow (register → use → optional top-up)

Using API Key

Header (Recommended)

API Endpoints

Send Email

Request Body

Parameters

Response

Text-to-Speech (TTS)

Request Body

Parameters

Response

Speech-to-Text (STT)

Request

Response

Translation

Request Body

Parameters

Supported Languages

Response

AI Chat (OpenAI-compatible bridge)

OpenAI base URL

List models

Chat completion

Request body (JSON)

Parameters

Response (non-streaming)

Response (streaming SSE)

Example (curl)

Example (OpenAI Python SDK)

Agent integration — tools / function calling

Step 1 — define tools

Step 2 — model returns tool_calls

Step 3 — execute locally and call again

Agent loop (pseudocode)

Streaming with tools

MCP server — use WebVoice in Cursor

Option A — Agent self-registration (no browser)

Option B — Browser signup (human)

Install

Cursor configuration

MCP tools

Image generation (text-to-image)

Request body (JSON)

Parameters

Response

Image to video

Request body (JSON or multipart)

Parameters

Response

Image to image

Request body (JSON or multipart)

Parameters

Response

Text to video

Request body (JSON)

Parameters

Response

Voice memo (text only)

Request Body

Parameters

Response

Get Available Voices

Query Parameters

Response (with language=it&provider=kokoro)

Response (no filters — Kokoro voices grouped by language)

TTS & STT catalog

TTS providers returned

Example response (abbreviated)

TTS stats (optional)

Mobile & web session API

Auth & utilities

TTS / STT catalog & synthesis