Speech to Text

The audio_stt function converts audio to text using an audio model provider.

Function name: extensions.com.cloud-apim.llm-extension.audio_stt

Parameters

Parameter	Type	Required	Description
`provider`	string	yes	The audio model provider id
`decode_base64`	boolean	no	Whether the input audio is base64 encoded (default: `false`)
`file_in`	string	no	The file path to read the audio from
`payload`	object	yes	The payload object
`payload.audio`	string	no	The audio data (base64 encoded if `decode_base64` is `true`)
`payload.content_type`	string	no	The audio content type (default: `audio/mp3`)
`payload.filename`	string	no	The audio file name
`payload.model`	string	no	The model name
`payload.language`	string	no	The language of the audio
`payload.prompt`	string	no	A prompt to guide the transcription
`payload.responseFormat`	string	no	The response format
`payload.temperature`	number	no	The temperature for transcription

Output

Returns the transcribed text as a string.

Example with file input

{
  "kind": "call",
  "function": "extensions.com.cloud-apim.llm-extension.audio_stt",
  "args": {
    "provider": "audio-model_xxxxx",
    "file_in": "/path/to/audio.mp3",
    "payload": {
      "model": "whisper-1",
      "language": "en"
    }
  },
  "result": "transcription"
}

Example with base64 input

{
  "kind": "call",
  "function": "extensions.com.cloud-apim.llm-extension.audio_stt",
  "args": {
    "provider": "audio-model_xxxxx",
    "decode_base64": true,
    "payload": {
      "audio": "...",
      "content_type": "audio/mp3",
      "model": "whisper-1",
      "language": "en"
    }
  },
  "result": "transcription"
}

Parameters​

Output​

Example with file input​

Example with base64 input​

Parameters

Output

Example with file input

Example with base64 input