Audio Translation

Audio Translation transcribes audio files and translates them into English. This is distinct from Speech-to-Text which transcribes in the original language.

Supported providers

Provider	Models
OpenAI	`whisper-1`
Azure OpenAI	`whisper-1`
Cloud Temple 🇫🇷 🇪🇺	`whisper-1`
Groq	`whisper-large-v3`
OpenAI Compatible	(your model)

ElevenLabs and Mistral do not support audio translation.

Translation configuration

OpenAI / Azure OpenAI / Cloud Temple

{
  "translation": {
    "enabled": true,
    "model": "whisper-1",
    "prompt": "Optional context for the translation",
    "response_format": "json",
    "temperature": 0
  }
}

Parameter	Type	Default	Description
`enabled`	boolean	`true`	Enable or disable translation
`model`	string	—	The model to use for translation
`prompt`	string	—	Optional text to guide the translation style
`response_format`	string	—	Response format: `json`, `text`, `srt`, `verbose_json`, `vtt`
`temperature`	number	—	Sampling temperature between 0 and 1

Groq

Same parameters as OpenAI.

API usage

Plugin setup

Add the Cloud APIM - Audio translation backend plugin to your route:

{
  "enabled": true,
  "plugin": "cp:otoroshi_plugins.com.cloud.apim.otoroshi.extensions.aigateway.plugins.OpenAICompatTranslation",
  "config": {
    "refs": ["audio-gen-model_xxxxxxxxx"],
    "max_size_upload": 104857600
  }
}

Request

curl https://my-audio-endpoint.example.com/v1/audio/translations \
  -H "Authorization: Bearer $OTOROSHI_API_KEY" \
  -F "file=@recording_french.mp3" \
  -F "model=whisper-1"

Response

{
  "text": "Hello, how are you today?"
}

The audio content is translated into English regardless of the source language.

Supported providers​

Translation configuration​

OpenAI / Azure OpenAI / Cloud Temple​

Groq​

API usage​

Plugin setup​

Request​

Response​