Audio Models

Otoroshi LLM Extension provides full support for audio generation models, enabling Text-to-Speech (TTS), Speech-to-Text (STT), and Audio Translation capabilities through a unified OpenAI-compatible API.

Supported providers

Provider	TTS	STT	Translation
OpenAI	Yes	Yes	Yes
Azure OpenAI	Yes	Yes	Yes
Cloud Temple 🇫🇷 🇪🇺	Yes	Yes	Yes
Groq	Yes	Yes	Yes
ElevenLabs	Yes	Yes	No
Mistral 🇫🇷 🇪🇺	No	Yes	No
AlphaEdge 🇫🇷 🇪🇺	No	Yes	No
OpenAI Compatible	Yes	Yes	Yes
OVH AI Endpoints 🇫🇷 🇪🇺	No	Yes	No

OpenAI Compatible is a generic provider for any service that speaks the OpenAI audio API. You point it at your own endpoint and give it a display name; the actual TTS / STT / Translation support depends on what the target endpoint offers.

Features

Unified API: All providers are exposed through OpenAI-compatible endpoints, regardless of the underlying provider
Multiple providers: Use different providers for TTS and STT on the same audio model entity
Model routing: Route to a specific provider using the provider_id###model_name or provider_id/model_name syntax in the model field
Vault integration: API tokens support Otoroshi vault references (e.g. ${vault://local/my-token})
Model constraints: Restrict which models can be used via allow/block lists, enforceable per API key or per user
Auditing: STT and Translation calls are fully audited with usage tracking, eco-impact, and cost reporting
Workflow integration: TTS and STT are available as workflow functions for use in agentic pipelines

API endpoints

Three Otoroshi plugins expose audio capabilities as API routes:

Plugin	Endpoint	Description
Cloud APIM - Text to speech backend	`POST /v1/audio/speech`	Converts text to audio
Cloud APIM - Speech to text backend	`POST /v1/audio/transcriptions`	Transcribes audio to text
Cloud APIM - Audio translation backend	`POST /v1/audio/translations`	Translates audio to English text

Audio model entity

An Audio Model entity groups TTS, STT, and Translation configurations under a single provider:

{
  "id": "audio-gen-model_xxxxxxxxx",
  "name": "My Audio Model",
  "description": "Audio model with TTS and STT",
  "provider": "openai",
  "config": {
    "connection": {
      "token": "${vault://local/OPENAI_API_TOKEN}",
      "timeout": 30000
    },
    "options": {
      "tts": {
        "enabled": true,
        "model": "gpt-4o-mini-tts",
        "voice": "alloy",
        "response_format": "mp3",
        "speed": 1
      },
      "stt": {
        "enabled": true,
        "model": "whisper-1"
      },
      "translation": {
        "enabled": true,
        "model": "whisper-1"
      }
    }
  }
}

Each capability (TTS, STT, Translation) can be individually enabled or disabled. See the dedicated pages for detailed configuration per provider.

Supported providers​

Features​

API endpoints​

Audio model entity​

Supported providers

Features

API endpoints

Audio model entity