Skip to main content

Safe models, controlled retention, and Groq-class speed

How WebVoice selects chat backends, why “safeguard” stacks matter, and how Groq inference compares to typical APIs on raw throughput.

This article explains the security posture of our AI chat layer—not as marketing fluff, but as architecture: which classes of models we expose, what “no training on your chats” means in practice, and how the Groq-powered routes deliver order-of-magnitude faster token generation than many general-purpose cloud APIs.

Retention: WebVoice does not use your conversations to fine-tune proprietary models. Third-party inference hosts apply their own short technical retention for abuse monitoring; we route through providers that fit our compliance story and document the rest in our Privacy and AI Policy. AI Policy · Privacy Policy

Internally curated models, not a random model zoo

Every chat model visible in WebVoice is registered in our control plane (display name, provider, credit cost). Administrators enable or retire endpoints deliberately. That means you are not exposed to arbitrary third-party models that were never reviewed: the catalogue is a closed list, versioned with migrations, aligned with billing and rate limits.

“Safeguard” oriented weights (for example OpenAI GPT-OSS variants configured for stronger alignment) can be offered side by side with general assistants. You choose the risk profile per thread—customer-facing answers vs internal brainstorming—without mixing policies accidentally.

Retention: what we do not do

We do not operate a consumer-style product that silently trains a single global model on all user content. Your prompts are processed to return a reply and to enforce quotas; they are not sold as training fodder.

Inference providers may keep ephemeral logs for a limited window to detect misuse; that is different from long-term retention for product analytics. Always read the latest provider terms—Groq publishes its data handling alongside performance numbers.

Why Groq is part of the story

Groq hosts large language models on its LPU™ inference hardware. For workloads that fit the context window, published throughput reaches hundreds to a thousand output tokens per second on several checkpoints—far above what many conventional GPU clouds achieve for the same class of open-weights models.

In WebVoice we map those endpoints into the same credit system as other chat providers: Groq-backed chats often cost fewer credits per message (see your live dashboard) while returning answers faster—ideal for interactive assistants and high-volume triage.

Numbers below come from our internal reference table for Groq (from public materials at integration time), plus illustrative bars for ChatGPT/OpenAI, Google Gemini, and generic APIs—order-of-magnitude comparisons from typical public benchmark ranges, not measured by WebVoice on your tenant. They are throughput indicators, not latency SLAs for your specific prompt size.

Output tokens per second (higher is faster generation)

GPT-OSS 120B on Groq LPU™
500 tok/s
GPT-OSS Safeguard 20B on Groq LPU™
1000 tok/s
GPT-OSS 20B on Groq LPU™
1000 tok/s
Llama 3.1 8B Instant on Groq
560 tok/s
Llama 3.3 70B Versatile on Groq
280 tok/s
Qwen3 Fast on Groq
662 tok/s
Kimi K2 instruct (Groq route, published TPS in stack)
200 tok/s
ChatGPT / OpenAI API (GPT-4o class, indicative output tok/s)
78 tok/s
Google Gemini API (Flash-tier, indicative output tok/s)
185 tok/s
Typical third-party chat API (indicative throughput)
95 tok/s
High-latency multi-hop cloud stack (indicative)
42 tok/s

Groq (WebVoice integration table) ChatGPT / OpenAI API (indicative) Gemini API (indicative) Other indicative baselines

ChatGPT and Gemini values are rounded illustrative output tok/s from common public benchmark bands (model tier, region, and load change real figures). They are shown for orientation only.

How to read the chart

  • Token throughput (tok/s) measures how quickly the model can stream output tokens under reference conditions; real chats vary with prompt length, batching, and network.
  • Green (Groq) bars use the same model IDs we store for pricing and docs (e.g. Safeguard 20B, Llama 3.1 8B Instant, Qwen3 Fast).
  • Orange (ChatGPT/OpenAI) and blue (Gemini) bars use indicative tok/s in the ballpark of GPT-4o-class and Flash-tier streaming reports—not live measurements against your OpenAI or Google account.
  • Grey bars illustrate unnamed generic or high-latency stacks for contrast.

Credits: faster does not mean “free”

Even with Groq-level speed, each completion still consumes credits. Typical defaults in WebVoice: Groq-class routes often use 0.5 credits per message, other providers often 1 — exact numbers appear in your dashboard.

Related reading

Trusted for production voice workloads

Inside the product — app screenshots, workflow, and reserved logo slots on the main site.

Product teams, agencies, and developers use WebVoice for TTS, STT, chat, and API-first integrations — from prototypes to customer-facing apps.

SaaS & product
Agencies
Education
Internal tools
50K+

Hours of audio synthesized & transcribed monthly (illustrative range)

30+

Neural voices and locales in the catalogue (varies by deployment)

REST

Same credit wallet for browser app and documented HTTP API

Figures are indicative and depend on traffic and configuration.

“We shipped read-aloud and STT in one sprint — the API matched what we tested in the UI.”

“Credits per feature make finance happy — we can forecast TTS vs chat separately.”

“Low-latency Groq routes for chat let us keep UX snappy without a separate vendor.”

Quotes represent typical feedback patterns; not attributed to specific customers.

Frequently asked questions

Safeguard-oriented weights are configured for stronger refusals and alignment with policy — useful for customer-facing or sensitive workflows.

We route some models through Groq’s inference stack for very low latency. Throughput (tok/s) is an indicative comparison; your workload may differ.

We do not use your conversations to fine-tune a global consumer model. See the privacy and AI policy pages for provider retention and your controls.

Green highlights Groq-hosted models from our table; orange and blue bars are rounded illustrative baselines for other APIs; grey is generic contrast.

Start with the privacy policy and the technical security article linked from this section of the site.

Ready to try WebVoice?

Get Started API documentation