Skip to main content

Text-to-Speech (TTS)

Text-to-Speech converts text input into audio output. Otoroshi exposes TTS through an OpenAI-compatible API endpoint.

Supported providers and models

ProviderModelsDefault voice
OpenAItts-1, tts-1-hd, gpt-4o-mini-ttsalloy
Azure OpenAItts, tts-hd, gpt-4o-mini-ttsalloy
Cloud Temple 🇫🇷 🇪🇺tts-1, tts-1-hd, gpt-4o-mini-ttsalloy
Groqplayai-tts, playai-tts-arabicalloy
ElevenLabseleven_monolingual_v1, eleven_multilingual_v2Rachel

Available voices

OpenAI / Azure OpenAI / Cloud Temple

alloy, ash, ballad, coral, echo, fable, onyx, nova, sage, shimmer, verse

Groq

Arista-PlayAI, Atlas-PlayAI, Basil-PlayAI

ElevenLabs

Voices are fetched dynamically from the ElevenLabs API. You can browse available voices in the Otoroshi admin UI when configuring the audio model entity.

TTS configuration

OpenAI / Azure OpenAI / Cloud Temple

{
"tts": {
"enabled": true,
"model": "gpt-4o-mini-tts",
"voice": "alloy",
"instructions": "Speak in a friendly tone",
"response_format": "mp3",
"speed": 1.0
}
}
ParameterTypeDefaultDescription
enabledbooleantrueEnable or disable TTS
modelstringgpt-4o-mini-ttsThe TTS model to use
voicestringalloyThe voice to use for generation
instructionsstringOptional instructions for the voice (e.g. tone, style)
response_formatstringAudio format: mp3, opus, aac, flac, wav, pcm
speednumberSpeed of the generated audio (0.25 to 4.0)

Groq

{
"tts": {
"enabled": true,
"model": "playai-tts",
"voice": "Arista-PlayAI",
"response_format": "wav"
}
}

Same parameters as OpenAI.

ElevenLabs

{
"tts": {
"enabled": true,
"model_id": "eleven_monolingual_v1",
"voice_id": "21m00Tcm4TlvDq8ikWAM",
"output_format": "mp3_44100_128"
}
}
ParameterTypeDefaultDescription
enabledbooleantrueEnable or disable TTS
model_idstringeleven_monolingual_v1The ElevenLabs model ID
voice_idstring21m00Tcm4TlvDq8ikWAMThe ElevenLabs voice ID (Rachel by default)
output_formatstringmp3_44100_128Output audio format

API usage

Plugin setup

Add the Cloud APIM - Text to speech backend plugin to your route:

{
"enabled": true,
"plugin": "cp:otoroshi_plugins.com.cloud.apim.otoroshi.extensions.aigateway.plugins.OpenAICompatTextToSpeech",
"config": {
"refs": ["audio-gen-model_xxxxxxxxx"]
}
}

Request

curl https://my-audio-endpoint.example.com/v1/audio/speech \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $OTOROSHI_API_KEY" \
-d '{
"input": "Hello, how are you today?",
"model": "gpt-4o-mini-tts",
"voice": "alloy",
"response_format": "mp3",
"speed": 1.0
}' \
--output speech.mp3

The response is a streamed audio file with the appropriate content type.

Request parameters

ParameterTypeDescription
inputstringThe text to convert to speech (required)
modelstringModel to use. Supports provider_id###model_name routing syntax
voicestringVoice to use
instructionsstringOptional voice instructions
response_formatstringDesired audio format
speednumberPlayback speed multiplier

Full entity example with OpenAI

{
"id": "audio-gen-model_xxxxxxxxx",
"name": "OpenAI Audio",
"description": "OpenAI text-to-speech",
"provider": "openai",
"config": {
"connection": {
"token": "${vault://local/OPENAI_API_TOKEN}",
"timeout": 30000
},
"options": {
"tts": {
"enabled": true,
"model": "gpt-4o-mini-tts",
"voice": "alloy",
"response_format": "mp3",
"speed": 1
},
"stt": {
"enabled": true,
"model": "whisper-1"
},
"translation": {
"enabled": true,
"model": "whisper-1"
}
}
},
"kind": "ai-gateway.extensions.cloud-apim.com/AudioModel"
}

Full entity example with Groq

{
"id": "audio-gen-model_xxxxxxxxx",
"name": "Groq TTS",
"description": "Groq text-to-speech",
"provider": "groq",
"config": {
"connection": {
"token": "${vault://local/GROQ_API_TOKEN}",
"timeout": 30000
},
"options": {
"tts": {
"enabled": true,
"model": "playai-tts",
"response_format": "wav"
},
"stt": {
"enabled": true,
"model": "whisper-large-v3-turbo"
},
"translation": {
"enabled": true,
"model": "whisper-large-v3"
}
}
},
"kind": "ai-gateway.extensions.cloud-apim.com/AudioModel"
}