WebVoice

WebVoice API Documentation

Overview — API forward

WebVoice is an API forward (bridge): your application talks to WebVoice with your account API key; we authenticate you, pick the managed model, call the upstream provider (Groq, DeepSeek, OpenRouter, Gemini, Moonshot, Z.AI, MiniMax, DeepInfra), and debit credits on your profile — same wallet as the web app.

For AI chat you can use the OpenAI-compatible surface: set base_url to the value below and pass your WebVoice key as api_key. Streaming (SSE) and non-streaming completions are supported.

https://webvoice.easytaskflow.app/api/v1/

Integration guide (App & tools) · Sign in to create API keys

Free chat models (0 credits)

These models are routed via OpenRouter with verified :free slugs. Via API they cost 0 credits per request — you can call them even with zero balance. Model ids: openrouter:openrouter/free, openrouter:openai/gpt-oss-20b:free, openrouter:openai/gpt-oss-120b:free, openrouter:google/gemma-4-31b-it:free. Groq-hosted similar models are billed normally.

Rate limits (OpenRouter-aligned, per account): 20 requests/minute; 50/day without credit purchases, 1000/day after at least one purchase. HTTP 429 with Retry-After when exceeded.

Model IDNameCost
openrouter:openrouter/free OpenRouter Free (auto) Free
openrouter:openai/gpt-oss-120b:free GPT OSS 120B (OpenRouter free) Free
openrouter:openai/gpt-oss-20b:free GPT OSS 20B (OpenRouter free) Free
openrouter:google/gemma-4-31b-it:free Google Gemma 4 31B (OpenRouter free) Free

All public chat models

Models available for integration (subject to your account tier after sign-in). Use the model id in POST chat/completions/.

Model ID Name Provider Credits / request
deepseek-v4-flash DeepSeek V4 Flash DeepSeek 2.0
openrouter:openrouter/free OpenRouter Free (auto) OpenRouter Free
openrouter:allenai/olmo-3.1-32b-think AllenAI: Olmo 3.1 32B Think OpenRouter 2.0
deepseek-v4-pro DeepSeek V4 Pro DeepSeek 3.0
openrouter:openai/gpt-oss-120b:free GPT OSS 120B (OpenRouter free) OpenRouter Free
deepseek-reasoner DeepSeek Reasoner (V4 thinking) DeepSeek 3.0
gemini Google Gemini Google Gemini 2.0
openrouter:google/gemini-2.5-flash Google: Gemini 2.5 Flash OpenRouter 2.0
openrouter:openai/gpt-oss-20b:free GPT OSS 20B (OpenRouter free) OpenRouter Free
deepseek-websearch DeepSeek WebSearch (V4 Flash) DeepSeek 2.0
openrouter:google/gemini-2.5-flash-image Google: Gemini 2.5 Flash Image (Nano Banana) OpenRouter 2.0
qwen3_fast Qwen3 Fast (Groq) Groq 2.0
llama-3.1-8b-instant Llama 3.1 8B Instant (Groq) Groq 2.0
openrouter:google/gemma-4-31b-it:free Google Gemma 4 31B (OpenRouter free) OpenRouter Free
openai/gpt-oss-safeguard-20b GPT OSS Safeguard 20B (Groq) Groq 2.0
moonshotai/kimi-k2-instruct-0905 Kimi K2 (Groq) Groq 2.0
openai/gpt-oss-20b GPT OSS 20B (Groq) Groq 2.0
llama-3.3-70b-versatile Llama 3.3 70B Versatile (Groq) Groq 3.0
openai/gpt-oss-120b GPT OSS 120B (Groq) Groq 2.0
moonshotai/Kimi-K2.6 Kimi K2.6 (DeepInfra) DeepInfra 3.0
nvidia/Nemotron-3-Nano-Omni-30B-A3B-Reasoning Nemotron 3 Nano Omni 30B Reasoning (DeepInfra) DeepInfra 2.0
Qwen/Qwen3-Max-Thinking Qwen3 Max Thinking (DeepInfra) DeepInfra 5.0
google/gemma-4-26B-A4B-it Gemma 4 26B IT (DeepInfra) DeepInfra 1.0
deepseek-chat DeepSeek Chat (legacy alias) DeepSeek 2.0
kimi-k2.5 Moonshot Kimi K2.5 Moonshot Kimi 2.0
glm-4.6 Z.AI GLM 4.6 Z.AI GLM 2.0
glm-4.7 Z.AI GLM 4.7 Z.AI GLM 2.0
glm-5 Z.AI GLM 5 Z.AI GLM 2.0
MiniMax-M2.7-highspeed MiniMax M2.7 Highspeed (api.minimax.io) MiniMax (api.minimax.io) 2.0

Authentication

TTS, STT, Translation, AI Chat, Image generation and Voices endpoints require an API key. You can create API keys from the API Dashboard.

Login with email code (no API key)

For web/mobile login without password: send email to receive a code, then verify the code. Rate limited per IP (no CAPTCHA).

  • POST https://webvoice.easytaskflow.app/api/v1/auth/send-code/ — JSON: email, accept_privacy, accept_terms. Response includes is_new_user.
  • POST https://webvoice.easytaskflow.app/api/v1/auth/verify-code/ — JSON: email, code, optional create_api_key, api_key_name. Stateless (no session cookie required). Returns api_key, onboarding (Solana memo + wallet).
  • GET https://webvoice.easytaskflow.app/api/v1/onboarding/ — Requires API key. Credits, can_use_api, optional Solana memo, PayPal URLs.
  • POST https://webvoice.easytaskflow.app/api/v1/keys/ — Requires API key. JSON {"name": "my-agent"} → new wv_… key.

Limits: 5 send-code and 20 verify-code requests per IP per 15 minutes.

Autonomous agent flow (register → use → optional top-up)

New accounts receive welcome credits (~20). If onboarding.can_use_api is true, start calling chat/TTS/STT immediately — Solana or PayPal are only needed when credits run out.

  1. POST auth/send-code/ — agent email + accept terms
  2. Read OTP from email (human or mailbox API)
  3. POST auth/verify-code/create_api_key: true → wv_… api_key + onboarding (credits, can_use_api, optional solana.memo_code)
  4. Configure MCP with WEBVOICE_API_KEY
  5. If credits > 0: call webvoice_chat / webvoice_tts / … immediately
  6. Optional top-up: Solana (onboarding.solana) or PayPal (onboarding.urls.buy_credits_paypal)
# 1 — send code
curl -X POST "https://webvoice.easytaskflow.app/api/v1/auth/send-code/" \\
  -H "Content-Type: application/json" \\
  -d '{"email":"agent@example.com","accept_privacy":true,"accept_terms":true}'

# 2 — verify + get API key (stateless; email in JSON, no session cookie)
curl -X POST "https://webvoice.easytaskflow.app/api/v1/auth/verify-code/" \\
  -H "Content-Type: application/json" \\
  -d '{"email":"agent@example.com","code":"123456","create_api_key":true,"api_key_name":"cursor-agent"}'

# Example onboarding fields in verify response:
# {
#   "api_key": "wv_…",
#   "onboarding": {
#     "credits": 20.0,
#     "can_use_api": true,
#     "billing": { "topup_required": false, "note": "…" },
#     "solana": { "available": true, "wallet": "…", "memo_code": "WV…" }
#   }
# }

Same flow via MCP: webvoice_register_send_codewebvoice_register_verifywebvoice_onboarding. See MCP section.

Using API Key

Include your API key in one of the following ways:

Header (Recommended)
X-API-Key: wv_your_api_key_here

Or use Authorization: Bearer wv_your_api_key_here. API keys in query strings (?api_key=) are not accepted — they would appear in server logs and browser history.

API Endpoints

Base URL: https://webvoice.easytaskflow.app/api/v1/

Production API endpoint: webvoice.easytaskflow.app

Endpoint Method Description
/auth/send-code/ POST Request login code by email (rate limited, no API key)
/auth/verify-code/ POST Verify code; optional create_api_key + onboarding (stateless JSON)
/onboarding/ GET Agent onboarding: credits, can_use_api, optional Solana memo (API key required)
/keys/ POST Create additional API key (JSON name)
/tts/ POST Generate speech from text
/stt/ POST Transcribe audio to text
/translation/ POST Translate text between languages
/chat/models/ GET List AI chat models (WebVoice extended fields)
/models/ GET OpenAI-compatible model list
/chat/completions/ POST OpenAI-compatible chat completions (optional SSE stream)
/image/ POST Text-to-image (MiniMax); server must have MINIMAX_API_KEY
/voices/ GET Get available voices
/status/ GET Check account status and credits
/send-email/ POST Send an email to a given address (text only, no attachments). Requires API key.

Send Email

Endpoint: POST /api/v1/send-email/

Sends an email from the site to the specified address. Text only, no attachments. Requires API key.

Rate limit: 10 emails per account per 1 hour(s). HTTP 429 with Retry-After when exceeded.

Request Body

{
  "to": "destinatario@example.com",
  "subject": "Oggetto (optional)",
  "body": "Testo dell'email"
}

Parameters

Parameter Type Required Description
to string Yes Recipient email address
subject string No Email subject (default: Message from WebVoice API)
body / text / content string Yes Email body (plain text, max 50000 characters)

Response

{
  "success": true,
  "message": "Email sent successfully",
  "to": "destinatario@example.com",
  "subject": "Oggetto"
}

Text-to-Speech (TTS)

Endpoint: POST /api/v1/tts/

Request Body

{
  "text": "Hello, world!",
  "voice": "af_sarah",
  "language": "en",
  "speed": 0.90
}

Parameters

Parameter Type Required Description
text string Yes Text to convert to speech
voice string Yes Voice ID (e.g., 'af_sarah', 'if_sara')
language string No Language code (default: 'en')
speed float No Speech speed (0.5-2.0, default: 0.90)

Response

{
  "success": true,
  "audio": "base64_encoded_mp3_data",
  "format": "mp3",
  "duration": 2.5,
  "credits_used": 0.5,
  "credits_remaining": 99.5
}

Speech-to-Text (STT)

Endpoint: POST /api/v1/stt/

Request

Send as multipart/form-data:

  • audio: Audio file (MP3, WAV, M4A, FLAC)
  • language: Optional. Omit or use empty/auto for automatic language detection (Whisper). Or specify code (it, en, es, fr, de, pt, ru, zh, ja, ko, etc.)
  • provider: Optional. whisper_small (default from account), whisper_fast (DeepInfra Turbo), whisper_groq (alias), whisper_max_local (Modal). Requires DEEPINFRA_API_KEY for whisper_fast.

Response

The 'language' field returns the auto-detected or specified language code.

{
  "success": true,
  "text": "Transcribed text here",
  "language": "it",
  "duration": 10.5,
  "credits_used": 0.3,
  "credits_remaining": 99.2
}

Translation

Endpoint: POST /api/v1/translation/

Request Body

{
  "text": "Hello, world!",
  "source_language": "en",
  "target_language": "it"
}

Parameters

Parameter Type Required Description
text string Yes Text to translate
source_language string No Source language code (default: 'auto', which defaults to 'en')
target_language string Yes Target language code (e.g., 'it', 'en', 'es', 'fr', 'de', 'pt', 'ru', 'zh', 'ja', 'ko', 'ar', 'nl', 'pl', 'tr', 'cs')

Supported Languages

The following language codes are supported:

  • en - English
  • it - Italian
  • es - Spanish
  • fr - French
  • de - German
  • pt - Portuguese
  • ru - Russian
  • zh - Chinese
  • ja - Japanese
  • ko - Korean
  • ar - Arabic
  • nl - Dutch
  • pl - Polish
  • tr - Turkish
  • cs - Czech

Response

{
  "success": true,
  "translated_text": "Ciao, mondo!",
  "source_language": "en",
  "target_language": "it",
  "text_length": 13,
  "credits_used": 0.1,
  "credits_remaining": 99.9
}

AI Chat (OpenAI-compatible bridge)

Use WebVoice as a drop-in OpenAI API base URL: authenticate with your WebVoice API key (X-API-Key or Authorization: Bearer), choose a managed model, and credits are billed on your profile. Responses follow the OpenAI chat.completions schema; credit info is in the optional webvoice field.

OpenAI base URL

https://webvoice.easytaskflow.app/api/v1/

Example: POST chat/completions/, GET models/, compatible with OpenAI SDK when you set base_url and api_key.

List models

WebVoice extended: GET /api/v1/chat/models/ — includes credits_per_request and display_name.

OpenAI-compatible: GET /api/v1/models/

{
  "object": "list",
  "data": [
    {
      "id": "deepseek-v4-flash",
      "object": "model",
      "created": 1700000000,
      "owned_by": "deepseek"
    }
  ]
}

Chat completion

Endpoint: POST /api/v1/chat/completions/

Same request body as OpenAI: model, messages, max_tokens, temperature, stream. Set stream: true for Server-Sent Events (text/event-stream).

DeepSeek routes: deepseek-v4-flash (default, fast), deepseek-v4-pro (frontier), deepseek-reasoner (V4 thinking mode). Legacy deepseek-chat still works and maps to V4 Flash.

Request body (JSON)

{
  "model": "deepseek-v4-flash",
  "messages": [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Summarize WebVoice in one sentence."}
  ],
  "max_tokens": 2000,
  "temperature": 0.7,
  "stream": false,
  "web_search": false
}

Parameters

Parameter Type Required Description
model string Yes Model id from GET /models/ or /chat/models/
provider string No Provider key (e.g. groq, openrouter, deepseek). Required when the same model id exists on multiple providers.
messages array Yes OpenAI message list. Standard chat: system|user|assistant (last role user). With tools: also assistant (with tool_calls) and tool (last role user or tool).
tools array No OpenAI function tools array. Supported on DeepSeek and OpenRouter models only.
tool_choice string|object No none, auto (default when tools set), required, or {"type":"function","function":{"name":"…"}}
stream boolean No If true, SSE stream of chat.completion.chunk events ending with data: [DONE]
max_tokens integer No Max response tokens (1–8000, default 2000)
temperature number No Sampling temperature 0–2 (default 0.7)
web_search boolean No WebVoice extension: web search where supported (DeepSeek WebSearch, Moonshot, Z.AI)

Response (non-streaming)

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1700000000,
  "model": "deepseek-v4-flash",
  "choices": [{
    "index": 0,
    "message": {
      "role": "assistant",
      "content": "WebVoice is a voice and AI platform for TTS, STT, chat and memos."
    },
    "finish_reason": "stop"
  }],
  "usage": {
    "prompt_tokens": 42,
    "completion_tokens": 18,
    "total_tokens": 60
  },
  "webvoice": {
    "credits_used": 2.0,
    "credits_remaining": 98.0,
    "provider": "deepseek"
  }
}

Response (streaming SSE)

data: {"id":"chatcmpl-...","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"role":"assistant","content":""},"finish_reason":null}]}

data: {"id":"chatcmpl-...","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":"Hello"},"finish_reason":null}]}

data: {"id":"chatcmpl-...","object":"chat.completion.chunk","choices":[{"index":0,"delta":{},"finish_reason":"stop"}],"usage":{"total_tokens":60},"webvoice":{"credits_used":2.0,"credits_remaining":98.0}}

data: [DONE]

Example (curl)

# Non-streaming
curl -X POST "https://webvoice.easytaskflow.app/api/v1/chat/completions/" \
  -H "Authorization: Bearer wv_your_api_key_here" \
  -H "Content-Type: application/json" \
  -d '{"model":"deepseek-v4-flash","messages":[{"role":"user","content":"Hello"}]}'

# Streaming
curl -N -X POST "https://webvoice.easytaskflow.app/api/v1/chat/completions/" \
  -H "X-API-Key: wv_your_api_key_here" \
  -H "Content-Type: application/json" \
  -d '{"model":"deepseek-v4-flash","stream":true,"messages":[{"role":"user","content":"Hello"}]}'

Example (OpenAI Python SDK)

from openai import OpenAI

client = OpenAI(
    api_key="wv_your_api_key_here",
    base_url="https://webvoice.easytaskflow.app/api/v1",
)

# Non-streaming
r = client.chat.completions.create(
    model="deepseek-v4-flash",
    messages=[{"role": "user", "content": "Hello"}],
)
print(r.choices[0].message.content)

# Streaming
stream = client.chat.completions.create(
    model="deepseek-v4-flash",
    messages=[{"role": "user", "content": "Hello"}],
    stream=True,
)
for event in stream:
    if event.choices[0].delta.content:
        print(event.choices[0].delta.content, end="")

Agent integration — tools / function calling

Build agents that call your own functions: send tools and tool_choice in the chat completion request. When the model wants to run a function, the response contains tool_calls instead of (or in addition to) text. Your code executes the function locally, appends role tool messages, and calls the API again until finish_reason is stop.

Supported providers: DeepSeek (e.g. deepseek-v4-flash) and OpenRouter models. Not available on Groq/Gemini free-tier routes. web_search and tools cannot be combined in one request.

Step 1 — define tools

{
  "model": "deepseek-v4-flash",
  "tool_choice": "auto",
  "tools": [{
    "type": "function",
    "function": {
      "name": "get_weather",
      "description": "Get current weather for a city",
      "parameters": {
        "type": "object",
        "properties": {
          "city": {"type": "string", "description": "City name"}
        },
        "required": ["city"]
      }
    }
  }],
  "messages": [
    {"role": "user", "content": "What is the weather in Rome?"}
  ]
}

Step 2 — model returns tool_calls

{
  "choices": [{
    "message": {
      "role": "assistant",
      "content": null,
      "tool_calls": [{
        "id": "call_abc123",
        "type": "function",
        "function": {
          "name": "get_weather",
          "arguments": "{\"city\":\"Rome\"}"
        }
      }]
    },
    "finish_reason": "tool_calls"
  }],
  "webvoice": {"credits_used": 2.0, "credits_remaining": 98.0}
}

Step 3 — execute locally and call again

Run get_weather("Rome") in your app, then POST again with the full message history including the assistant tool_calls message and your tool result:

{
  "model": "deepseek-v4-flash",
  "tools": [ ... same tools ... ],
  "messages": [
    {"role": "user", "content": "What is the weather in Rome?"},
    {
      "role": "assistant",
      "content": null,
      "tool_calls": [{
        "id": "call_abc123",
        "type": "function",
        "function": {"name": "get_weather", "arguments": "{\"city\":\"Rome\"}"}
      }]
    },
    {
      "role": "tool",
      "tool_call_id": "call_abc123",
      "content": "{\"temp_c\": 22, \"condition\": \"sunny\"}"
    }
  ]
}

Agent loop (pseudocode)

messages = [{"role": "user", "content": user_prompt}]
while True:
    resp = client.chat.completions.create(
        model="deepseek-v4-flash",
        messages=messages,
        tools=TOOLS,
        tool_choice="auto",
    )
    msg = resp.choices[0].message
    if resp.choices[0].finish_reason == "tool_calls":
        messages.append(msg.model_dump())
        for tc in msg.tool_calls:
            result = run_function(tc.function.name, json.loads(tc.function.arguments))
            messages.append({
                "role": "tool",
                "tool_call_id": tc.id,
                "content": json.dumps(result),
            })
        continue
    return msg.content

Streaming with tools

Set stream: true. Chunks may include delta.tool_calls (partial). The final chunk has finish_reason tool_calls or stop. Credits are billed once per completed API request (same as non-streaming).

MCP server — use WebVoice in Cursor

Install the local stdio MCP server (webvoice-mcp) to expose WebVoice as tools in Cursor, Claude Desktop, or any MCP client. Calls are forwarded to this REST API.

Option A — Agent self-registration (no browser)

  1. Add MCP to Cursor (config below); WEBVOICE_API_KEY can be empty for the first step.
  2. webvoice_register_send_code — email + accept terms
  3. webvoice_register_verify — OTP → api_key (once) + onboarding
  4. Set WEBVOICE_API_KEY in mcp.json and restart Cursor.
  5. If onboarding.can_use_api is true, use chat/TTS/STT right away with welcome credits.
  6. Optional: top up via Solana (webvoice_onboarding → solana.memo_code) or PayPal URLs in onboarding.urls.

REST equivalent: Agent registration (auth/send-code, auth/verify-code).

Option B — Browser signup (human)

  1. Sign up / log inLogin with email code or Google.
  2. Create an API keyAPI dashboardwv_…
  3. Configure MCP — WEBVOICE_API_KEY in Cursor (below).

Credits are shared across web, API, and MCP. When balance is zero, calls return HTTP 402 and we email a recharge link. Solana and PayPal top-ups are optional — not required if you already have credits.

Install

pip install webvoice-mcp
# or from source:
pip install -r requirements-mcp.txt
pip install -e .

PyPI package + MCP Registry: io.github.easytaskflow/webvoice-mcp — see docs/MCP_DISTRIBUTION.md in the repo.

Cursor configuration

Add to ~/.cursor/mcp.json (Settings → MCP):

{
  "mcpServers": {
    "webvoice": {
      "command": "webvoice-mcp",
      "env": {
        "WEBVOICE_API_KEY": "wv_your_api_key_here"
      }
    }
  }
}

Optional env: WEBVOICE_BASE_URL (default https://webvoice.easytaskflow.app/api/v1).

MCP tools

ToolMaps to / purpose
webvoice_register_send_codePOST auth/send-code/ — no API key
webvoice_register_verifyPOST auth/verify-code/ — api_key + onboarding
webvoice_onboardingGET onboarding/ — credits, can_use_api, Solana memo
webvoice_statusGET status/
webvoice_list_chat_modelsGET chat/models/
webvoice_list_voicesGET voices/
webvoice_chatPOST chat/completions/ — DeepSeek, Groq, OpenRouter, DeepInfra
webvoice_deepinfra_chatPOST chat/completions/ — Kimi K2.6, Nemotron, Qwen3 Max Thinking, Gemma 4 IT
webvoice_ttsPOST tts/ — set output_path to save MP3
webvoice_sttPOST stt/ — local audio_path; optional provider
webvoice_whisper_fast_sttPOST stt/ — DeepInfra whisper-large-v3-turbo
webvoice_translatePOST translation/
webvoice_imagePOST image/ — MiniMax, WAN, Gen4, SDXL, qwen-image-max, …
webvoice_qwen_image_maxPOST image/ — Qwen-Image-Max via DeepInfra
webvoice_image_to_imagePOST image-to-image/ — FLUX.2 Flash Edit
webvoice_image_to_videoPOST image-to-video/ — Seedance / Gen4 Turbo
webvoice_text_to_videoPOST text-to-video/ — Seedance, Veo Lite, Veo Fast
webvoice_veo_fast_text_to_videoPOST text-to-video/ — Google Veo 3.1 Fast (DeepInfra)
webvoice_account_linksDashboard / billing URLs

See also webvoice_mcp/README.md in the repository.

Image generation (text-to-image)

Default provider is MiniMax (MINIMAX_API_KEY). WaveSpeed providers (WAVESPEED_API_KEY): wan, nucleus, gen4, gen4-turbo, sd3, sdxl, flux-klein-lora. DeepInfra (DEEPINFRA_API_KEY): qwen-image-max. Missing key → HTTP 503.

Endpoint: POST /api/v1/image/

Upstream docs: MiniMax · WAN 2.7 · Nucleus · Gen4 · SDXL · FLUX Klein LoRA · Qwen-Image-Max (DeepInfra)

Request body (JSON)

{
  "prompt": "A red bicycle on a sunny street",
  "provider": "minimax",
  "width": 1024,
  "height": 1024,
  "seed": 42,
  "model": "image-01",
  "aspect_ratio": "16:9",
  "response_format": "base64",
  "provider": "nucleus",
  "negative_prompt": "blurry, low quality",
  "num_images": 1,
  "num_inference_steps": 50,
  "guidance_scale": 8,
  "output_format": "png",
  "provider": "gen4",
  "resolution": "1080p",
  "reference_images": ["https://example.com/ref1.jpg"],
  "seed": 42
}

Runway Gen4 Image Turbo example (faster, ~$0.03 upstream):

{
  "provider": "gen4-turbo",
  "prompt": "Character wearing a leather jacket in a cyberpunk city",
  "aspect_ratio": "9:16",
  "resolution": "1080p",
  "reference_images": ["https://example.com/character-ref.jpg"]
}

Stable Diffusion 3 example (~$0.03 upstream, optional img2img):

{
  "provider": "sd3",
  "prompt": "Marina Bay at sunset, vivid purple and orange afterglow, long exposure",
  "aspect_ratio": "16:9",
  "image_url": "https://example.com/style-ref.jpg",
  "seed": -1
}

Stability AI SDXL example (~$0.0026 upstream):

{
  "provider": "sdxl",
  "prompt": "Close-up food photography of a gourmet cheeseburger, shallow depth of field",
  "aspect_ratio": "1:1",
  "size": "1024*1024",
  "seed": -1
}

Qwen-Image-Max via DeepInfra example (~$0.075/image upstream, photorealistic):

{
  "provider": "qwen-image-max",
  "prompt": "Portrait of a woman in natural light, lifelike skin texture, shallow depth of field",
  "aspect_ratio": "3:4",
  "size": "1024x1536"
}

FLUX.2 Klein 9B LoRA example (~$0.02 upstream):

{
  "provider": "flux-klein-lora",
  "prompt": "Cinematic romantic drama still, rooftop at sunset, film photography",
  "aspect_ratio": "16:9",
  "size": "1024*1024",
  "loras": [
    {"path": "https://example.com/my-style-lora.safetensors", "scale": 1.0}
  ],
  "seed": -1
}

Parameters

Parameter Type Required Description
provider string No minimax (default), wan, nucleus, gen4, gen4-turbo, sd3, sdxl, or flux-klein-lora
prompt string Yes Text description of the image (up to 4000 characters, truncated server-side)
width / height integer No Optional pixel size; each clamped between 512 and 2048. For MiniMax, used to pick the closest aspect_ratio. For WAN, sent as width×height (e.g. 1024*1024).
seed integer No Optional; reproducible generation when supported by the provider.
model string No MiniMax only — image model (e.g. image-01). Unknown values fall back to the server default.
aspect_ratio string No e.g. 1:1, 16:9, 9:16 — supported by all providers when width/height are omitted.
thinking_mode boolean No WAN only — enhanced reasoning for better quality (default: true).
negative_prompt string No Nucleus only — elements to exclude from the image.
num_images integer No Nucleus only — 1 (default) or 2 images per request. Billed per image.
num_inference_steps integer No Nucleus only — 1–100, default 50.
guidance_scale number No Nucleus only — classifier-free guidance, 0–20, default 8.
output_format string No Nucleus only — png (default) or jpeg.
resolution string No Gen4 / Gen4 Turbo — 720p or 1080p (default 1080p).
reference_images / reference_image_urls array / string No Gen4 / Gen4 Turbo — up to 3 public HTTPS URLs to guide style or subject.
size string No SDXL / WAN — canvas size e.g. 1024*1024 (256–1536 px per side for SDXL).
loras / lora_url array / string No FLUX Klein LoRA — list of {path, scale?} objects, or single lora_url + lora_scale.
image_url string No SD3 / SDXL — optional reference image URL for image-to-image.
response_format string No base64 (default) or url — how MiniMax returns image data; the API always responds with base64-encoded image bytes in JSON.

Response

Binary images are returned as Base64 strings (typically PNG or JPEG depending on the provider).

{
  "success": true,
  "provider": "wan",
  "image": "base64_encoded_image_bytes",
  "images": ["base64_1", "base64_2"],
  "count": 1,
  "credits_used": 1.0,
  "credits_remaining": 99.0
}

Credits per image: MiniMax 7, WAN 8, Nucleus 4 (× num_images), Gen4 2/3, Gen4 Turbo 1, SD3 1, SDXL 1, FLUX Klein LoRA 1.

Image to video

Generates video from a reference image. Default provider is Seedance 2.0 Turbo (720p/1080p, optional audio); use provider gen4 for Runway Gen4 Turbo (WAVESPEED_API_KEY).

Endpoint: POST /api/v1/image-to-video/

Upstream docs: Seedance 2.0 I2V Turbo · Runway Gen4 Turbo

Request body (JSON or multipart)

{
  "provider": "seedance",
  "prompt": "Slow camera pan, golden hour light, gentle wind in the trees",
  "image_url": "https://example.com/reference.jpg",
  "duration": 5,
  "resolution": "720p",
  "aspect_ratio": "adaptive",
  "generate_audio": true
}

Runway Gen4 Turbo example:

{
  "provider": "gen4",
  "prompt": "The subject turns slowly toward the camera, soft wind in their hair",
  "image_url": "https://example.com/reference.jpg",
  "duration": 5,
  "aspect_ratio": "16:9"
}

Alternatively send multipart/form-data with fields prompt, duration, resolution, aspect_ratio, generate_audio and file field image.

Or pass reference image as base64 in JSON field image (instead of image_url).

Parameters

Parameter Type Required Description
provider string No seedance (default) or gen4 / runway
prompt string Yes Motion, camera, lighting, mood.
image_url / image string Yes Public HTTPS URL or base64-encoded image bytes.
duration integer No 4–15 seconds for Seedance (default 5); 2–10 for Gen4 (default 5).
resolution string No 720p (default) or 1080p (Seedance only)
aspect_ratio string No Seedance: 16:9, 9:16, 1:1, 4:3, 3:4, 21:9, adaptive (default). Gen4: 16:9, 9:16, 1:1, 4:3, 3:4 or omit to match source.
generate_audio boolean No Native synchronized audio (Seedance only, default: true).
last_image_url string No Optional last-frame image for continuation.

Response

{
  "success": true,
  "provider": "seedance-2.0-turbo",
  "video": "base64_encoded_mp4_bytes",
  "duration": 5,
  "resolution": "720p",
  "credits_used": 15.0,
  "credits_remaining": 85.0,
  "video_size_bytes": 1234567
}

Credits: Seedance — 15/16 per 5s @ 720p/1080p (× duration). Gen4 Turbo — 2 per 5s (~$0.05 upstream).

Image to image

Edits 1–4 input images with FLUX.2 Flash Edit — restyle scenes, replace backgrounds, inpaint/outpaint, and retouch with plain-English prompts (WAVESPEED_API_KEY).

Endpoint: POST /api/v1/image-to-image/

Upstream docs: FLUX.2 Flash Edit

Request body (JSON or multipart)

{
  "provider": "flux-flash-edit",
  "prompt": "Replace the background with a modern office, keep the product centered",
  "images": [
    "https://example.com/product.jpg"
  ],
  "aspect_ratio": "1:1",
  "seed": 42
}

Multiple input images (up to 4):

{
  "prompt": "Merge styles: subject from first image, lighting from second",
  "image_urls": "https://example.com/a.jpg,https://example.com/b.jpg"
}

Alternatively send multipart/form-data with fields prompt, aspect_ratio, size, seed and file field images (1–4 files) or single image.

Or pass input images as base64 in JSON field image or array images.

Parameters

Parameter Type Required Description
provider string No flux-flash-edit (default)
prompt string Yes Edit instructions in plain English (HEX colors supported).
images / image_url / image array / string Yes 1–4 public HTTPS URLs or base64-encoded images.
aspect_ratio string No 1:1, 16:9, 9:16, 4:3, 3:4 (default 1024×1024).
size string No Custom width*height (256–1536), e.g. 1024*768.
seed integer No Optional reproducibility seed.

Response

{
  "success": true,
  "provider": "flux-2-flash-edit",
  "image": "base64_encoded_png_or_jpeg_bytes",
  "input_count": 1,
  "credits_used": 1.0,
  "credits_remaining": 99.0,
  "image_size_bytes": 456789
}

Credits: FLUX.2 Flash Edit — 1 credit per edit (~$0.013 upstream).

Text to video

Generates cinematic 720p/1080p video from a text prompt. Default provider is Seedance 2.0 Fast Turbo (WAVESPEED_API_KEY); use provider veo for Google Veo 3.1 Lite (WaveSpeed) or veo-fast for Google Veo 3.1 Fast (DEEPINFRA_API_KEY, $0.15/s upstream).

Endpoint: POST /api/v1/text-to-video/

Upstream docs: Seedance 2.0 Fast T2V Turbo · Google Veo 3.1 Lite · Google Veo 3.1 Fast (DeepInfra)

Request body (JSON)

{
  "provider": "veo",
  "prompt": "Tracking shot through a neon-lit alley at night, rain reflections",
  "aspect_ratio": "16:9",
  "resolution": "720p",
  "duration": 6,
  "generate_audio": true,
  "negative_prompt": "blurry, low quality",
  "seed": 42
}

Parameters

Parameter Type Required Description
provider string No seedance (default) or veo (Lite, WaveSpeed) or veo-fast (DeepInfra)
prompt string Yes Cinematic scene description.
aspect_ratio string No 16:9 (default), 9:16, 1:1, 4:3, 3:4, 21:9.
duration integer No 4–15 seconds (default 5).
resolution string No 720p (default) or 1080p
generate_audio boolean No Native synchronized audio (default: true).
reference_images array No Seedance only — HTTPS URLs (doubles credit cost).
negative_prompt string No Veo only — elements to exclude.
seed integer No Veo only — reproducible generation.
reference_videos array No Seedance only — reference video URLs (doubles credit cost).

Response

{
  "success": true,
  "provider": "seedance-2.0-fast-turbo",
  "video": "base64_encoded_mp4_bytes",
  "duration": 5,
  "resolution": "720p",
  "credits_used": 13.0,
  "credits_remaining": 87.0,
  "video_size_bytes": 1234567
}

Credits: Seedance — 13/14 per 5s @ 720p/1080p (×2 with references). Veo 3.1 Lite — 7/11 per 6s @ 720p/1080p. Veo 3.1 Fast — 3.5 credits/s ($0.15/s upstream).

Voice memo (text only)

Create a voice memo from text only (no audio). The text is stored as a voice memo in your account and appears in the Voice notes list. Free — no credits charged.

Endpoint: POST /api/v1/voice-memo/text/

Request Body

{
  "text": "Your note text here",
  "title": "Optional title (default: first 50 chars of text)"
}

Parameters

Parameter Type Required Description
text string Yes Text to save as voice memo (no audio generated; max 10000 characters)
title string No Optional title; if omitted, first 50 characters of text are used

Response

{
  "success": true,
  "voice_note_id": 123,
  "title": "Your note text here",
  "detail_url": "https://.../audio/voice-notes/123/",
  "created_at": "2026-02-04T12:00:00Z"
}

Get Available Voices

Endpoint: GET /api/v1/voices/

Query Parameters

  • language: Filter voices by language code (optional)

Response

{
  "success": true,
  "voices": {
    "en": [
      {"id": "af_sarah", "name": "Sarah (Female, English)"},
      {"id": "am_adam", "name": "Adam (Male, English)"}
    ],
    "it": [
      {"id": "if_sara", "name": "Sara (Female, Italian)"}
    ]
  }
}

Check Status

Endpoint: GET /api/v1/status/

Response

{
  "success": true,
  "user": "username",
  "credits": 100.0,
  "api_key": {
    "name": "My API Key",
    "created_at": "2024-01-01T00:00:00Z",
    "last_used": "2024-01-15T12:00:00Z",
    "usage_count": 42
  }
}

Code Examples

cURL - TTS

curl -X POST https://webvoice.easytaskflow.app/api/v1/tts/ \
  -H "X-API-Key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "text": "Hello, world!",
    "voice": "af_sarah",
    "language": "en",
    "speed": 0.90
  }'

Python - TTS

import requests
import base64

api_key = "YOUR_API_KEY"
url = "https://webvoice.easytaskflow.app/api/v1/tts/"

response = requests.post(
    url,
    headers={"X-API-Key": api_key},
    json={
        "text": "Hello, world!",
        "voice": "af_sarah",
        "language": "en",
        "speed": 0.90
    }
)

data = response.json()
audio_data = base64.b64decode(data["audio"])

with open("output.mp3", "wb") as f:
    f.write(audio_data)

Python - STT

import requests

api_key = "YOUR_API_KEY"
url = "https://webvoice.easytaskflow.app/api/v1/stt/"

with open("audio.mp3", "rb") as f:
    response = requests.post(
        url,
        headers={"X-API-Key": api_key},
        files={"audio": f},
        data={"language": "it"}
    )

data = response.json()
print(data["text"])

Python - Translation

import requests

api_key = "YOUR_API_KEY"
url = "https://webvoice.easytaskflow.app/api/v1/translation/"

response = requests.post(
    url,
    headers={
        "X-API-Key": api_key,
        "Content-Type": "application/json"
    },
    json={
        "text": "Hello, world!",
        "source_language": "en",
        "target_language": "it"
    }
)

data = response.json()
print(data["translated_text"])

cURL - Translation

curl -X POST https://webvoice.easytaskflow.app/api/v1/translation/ \
  -H "X-API-Key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "text": "Hello, world!",
    "source_language": "en",
    "target_language": "it"
  }'

cURL - Image generation

curl -X POST https://webvoice.easytaskflow.app/api/v1/image/ \
  -H "X-API-Key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "A watercolor landscape with mountains",
    "width": 1024,
    "height": 1024
  }'

Python - Image generation

import requests
import base64

api_key = "YOUR_API_KEY"
url = "https://webvoice.easytaskflow.app/api/v1/image/"

response = requests.post(
    url,
    headers={"X-API-Key": api_key, "Content-Type": "application/json"},
    json={
        "prompt": "A red bicycle on a sunny street",
        "width": 1024,
        "height": 1024,
    },
)
data = response.json()
raw = base64.b64decode(data["image"])
with open("out.png", "wb") as f:
    f.write(raw)

Error Codes

Status Code Error Description
400 Bad Request Invalid request parameters
401 Unauthorized Invalid or missing API key
402 Payment Required Insufficient credits
429 Too Many Requests Free chat model rate limit exceeded (OpenRouter-aligned RPM/RPD); Retry-After header set
429 Too Many Requests Send-email rate limit exceeded; Retry-After header set
500 Internal Server Error Server error occurred

Error Response Format

{
  "error": "Error type",
  "message": "Detailed error message"
}

Credits

  • TTS: 0.5 credits per minute
  • STT: 0.3 credits per minute
  • Translation: 0.1 credit per 1000 characters (minimum 0.1)
  • Image generation: credits per image as configured on the server (IMAGE_GENERATION_CREDITS; default 7 credits per image)