Skip to main content

Audio Models

Otoroshi LLM Extension provides full support for audio generation models, enabling Text-to-Speech (TTS), Speech-to-Text (STT), and Audio Translation capabilities through a unified OpenAI-compatible API.

Supported providers

ProviderTTSSTTTranslation
OpenAIYesYesYes
Azure OpenAIYesYesYes
Cloud Temple 🇫🇷 🇪🇺YesYesYes
GroqYesYesYes
ElevenLabsYesYesNo
MistralNoYesNo

Features

  • Unified API: All providers are exposed through OpenAI-compatible endpoints, regardless of the underlying provider
  • Multiple providers: Use different providers for TTS and STT on the same audio model entity
  • Model routing: Route to a specific provider using the provider_id###model_name or provider_id/model_name syntax in the model field
  • Vault integration: API tokens support Otoroshi vault references (e.g. ${vault://local/my-token})
  • Model constraints: Restrict which models can be used via allow/block lists, enforceable per API key or per user
  • Auditing: STT and Translation calls are fully audited with usage tracking, eco-impact, and cost reporting
  • Workflow integration: TTS and STT are available as workflow functions for use in agentic pipelines

API endpoints

Three Otoroshi plugins expose audio capabilities as API routes:

PluginEndpointDescription
Cloud APIM - Text to speech backendPOST /v1/audio/speechConverts text to audio
Cloud APIM - Speech to text backendPOST /v1/audio/transcriptionsTranscribes audio to text
Cloud APIM - Audio translation backendPOST /v1/audio/translationsTranslates audio to English text

Audio model entity

An Audio Model entity groups TTS, STT, and Translation configurations under a single provider:

{
"id": "audio-gen-model_xxxxxxxxx",
"name": "My Audio Model",
"description": "Audio model with TTS and STT",
"provider": "openai",
"config": {
"connection": {
"token": "${vault://local/OPENAI_API_TOKEN}",
"timeout": 30000
},
"options": {
"tts": {
"enabled": true,
"model": "gpt-4o-mini-tts",
"voice": "alloy",
"response_format": "mp3",
"speed": 1
},
"stt": {
"enabled": true,
"model": "whisper-1"
},
"translation": {
"enabled": true,
"model": "whisper-1"
}
}
}
}

Each capability (TTS, STT, Translation) can be individually enabled or disabled. See the dedicated pages for detailed configuration per provider.