Skip to main content

Speech to Text

The audio_stt function converts audio to text using an audio model provider.

  • Function name: extensions.com.cloud-apim.llm-extension.audio_stt

Parameters

ParameterTypeRequiredDescription
providerstringyesThe audio model provider id
decode_base64booleannoWhether the input audio is base64 encoded (default: false)
file_instringnoThe file path to read the audio from
payloadobjectyesThe payload object
payload.audiostringnoThe audio data (base64 encoded if decode_base64 is true)
payload.content_typestringnoThe audio content type (default: audio/mp3)
payload.filenamestringnoThe audio file name
payload.modelstringnoThe model name
payload.languagestringnoThe language of the audio
payload.promptstringnoA prompt to guide the transcription
payload.responseFormatstringnoThe response format
payload.temperaturenumbernoThe temperature for transcription

Output

Returns the transcribed text as a string.

Example with file input

{
"kind": "call",
"function": "extensions.com.cloud-apim.llm-extension.audio_stt",
"args": {
"provider": "audio-model_xxxxx",
"file_in": "/path/to/audio.mp3",
"payload": {
"model": "whisper-1",
"language": "en"
}
},
"result": "transcription"
}

Example with base64 input

{
"kind": "call",
"function": "extensions.com.cloud-apim.llm-extension.audio_stt",
"args": {
"provider": "audio-model_xxxxx",
"decode_base64": true,
"payload": {
"audio": "...",
"content_type": "audio/mp3",
"model": "whisper-1",
"language": "en"
}
},
"result": "transcription"
}