💰 Budgets Management
Budgets management for LLMs with a gateway means monitoring and managing the costs of using different LLMs through an API gateway.
Our Otoroshi LLM extension helps you optimize usage, control your budget, and improve cost efficiency across models.
Configuration
budgets {
enabled = true
enabled = ${?CLOUD_APIM_EXTENSIONS_LLM_EXTENSION_BUDGETS_ENABLED}
embed-budgets-in-responses = false
embed-budgets-in-responses = ${?CLOUD_APIM_EXTENSIONS_LLM_EXTENSION_BUDGETS_EMBED_BUGETS_IN_RESPONSES}
}
Once it's enabled, you can create budgets from the UI or the admin. API and enforce budget limits to any consumer
Budgets
A budget is defined between a start date and an end date. Limits are applied for a duration that is renewed between the start and end date. Limits can be expressed in USD or in tokens.

A budget can defined limits that can be global or per usage (inference, image, audio, video, embedding, moderation)


A budget is defined for a scope based on pretty much whatever you want (apikey, api group, user, provider, model, request, etc). You can reference a budget in entity metadata using ai_budget_ref (apikey, api group, user, provider, auth. module)

A budget can block requests on exceeded limits or just emit alerts

Admin API
in addition to the classic admin API endpoints for the ai-gateway.extensions.cloud-apim.com/AiBudget entity, you have
- GET /api/extensions/cloud-apim/extensions/ai-extension/budgets/:id/consumption
- POST /api/extensions/cloud-apim/extensions/ai-extension/budgets/:id/consumption/_reset
the first one will return a response like:
{
"consumed_total_usd": 0.00188655,
"consumed_total_tokens": 8036,
"consumed_inference_usd": 0.00188655,
"consumed_inference_tokens": 8036,
"consumed_image_usd": 0,
"consumed_image_tokens": 0,
"consumed_audio_usd": 0,
"consumed_audio_tokens": 0,
"consumed_video_usd": 0,
"consumed_video_tokens": 0,
"consumed_embedding_usd": 0,
"consumed_embedding_tokens": 0,
"consumed_moderation_usd": 0,
"consumed_moderation_tokens": 0,
"remaining_total_usd": 199.99811345,
"remaining_total_tokens": 9991964,
"remaining_inference_usd": 0,
"remaining_inference_tokens": 0,
"remaining_image_usd": 0,
"remaining_image_tokens": 0,
"remaining_audio_usd": 0,
"remaining_audio_tokens": 0,
"remaining_video_usd": 0,
"remaining_video_tokens": 0,
"remaining_embedding_usd": 0,
"remaining_embedding_tokens": 0,
"remaining_moderation_usd": 0,
"remaining_moderation_tokens": 0,
"allowed_total_usd": 200,
"allowed_total_tokens": 10000000,
"allowed_inference_usd": null,
"allowed_inference_tokens": null,
"allowed_image_usd": null,
"allowed_image_tokens": null,
"allowed_audio_usd": null,
"allowed_audio_tokens": null,
"allowed_video_usd": null,
"allowed_video_tokens": null,
"allowed_embedding_usd": null,
"allowed_embedding_tokens": null,
"allowed_moderation_usd": null,
"allowed_moderation_tokens": null
}
Embed in responses
curl --request POST \
--url 'http://proxy.oto.tools:8080/v1/chat/completions?embed_budget=true' \
--header 'content-type: application/json' \
--data '{
"model": "anthropic/claude-sonnet-4-5-20250929",
"stream": false,
"messages": [
{
"role": "user",
"content": "tell me a joke ?"
}
]
}'
{
"id": "chatcmpl-9ODWHjtFVS5qzpU1nc79TprfRhZElq0e",
"object": "chat.completion",
"created": 1762441013,
"model": "claude-sonnet-4-5-20250929",
"system_fingerprint": "fp-tEQXbmpSPELEh3fYd5Rs9Jf4HIaI3E6s",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Why don't scientists trust atoms?\n\nBecause they make up everything! 😄"
},
"logprobs": null,
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 12,
"completion_tokens": 20,
"total_tokens": 32,
"completion_tokens_details": {
"reasoning_tokens": -1
}
},
"budget": {
"consumed_total_usd": 0.00394755,
"consumed_total_tokens": 8231,
"consumed_inference_usd": 0.00394755,
"consumed_inference_tokens": 8231,
"consumed_image_usd": 0,
"consumed_image_tokens": 0,
"consumed_audio_usd": 0,
"consumed_audio_tokens": 0,
"consumed_video_usd": 0,
"consumed_video_tokens": 0,
"consumed_embedding_usd": 0,
"consumed_embedding_tokens": 0,
"consumed_moderation_usd": 0,
"consumed_moderation_tokens": 0,
"remaining_total_usd": 199.99605245,
"remaining_total_tokens": 9991769,
"remaining_inference_usd": 0,
"remaining_inference_tokens": 0,
"remaining_image_usd": 0,
"remaining_image_tokens": 0,
"remaining_audio_usd": 0,
"remaining_audio_tokens": 0,
"remaining_video_usd": 0,
"remaining_video_tokens": 0,
"remaining_embedding_usd": 0,
"remaining_embedding_tokens": 0,
"remaining_moderation_usd": 0,
"remaining_moderation_tokens": 0,
"allowed_total_usd": 200,
"allowed_total_tokens": 10000000,
"allowed_inference_usd": null,
"allowed_inference_tokens": null,
"allowed_image_usd": null,
"allowed_image_tokens": null,
"allowed_audio_usd": null,
"allowed_audio_tokens": null,
"allowed_video_usd": null,
"allowed_video_tokens": null,
"allowed_embedding_usd": null,
"allowed_embedding_tokens": null,
"allowed_moderation_usd": null,
"allowed_moderation_tokens": null
}
}