💰 Budgets Management
Budgets management for LLMs with a gateway means monitoring and managing the costs of using different LLMs through an API gateway.
Our Otoroshi LLM extension helps you optimize usage, control your budget, and improve cost efficiency across models.
Configuration
budgets {
enabled = true
enabled = ${?CLOUD_APIM_EXTENSIONS_LLM_EXTENSION_BUDGETS_ENABLED}
embed-budgets-in-responses = false
embed-budgets-in-responses = ${?CLOUD_APIM_EXTENSIONS_LLM_EXTENSION_BUDGETS_EMBED_BUGETS_IN_RESPONSES}
}
Once it's enabled, you can create budgets from the UI or the admin. API and enforce budget limits to any consumer
Budgets
A budget is defined between a start date and an end date. Limits are applied for a duration that is renewed between the start and end date. Limits can be expressed in USD or in tokens.

A budget can defined limits that can be global or per usage (inference, image, audio, video, embedding, moderation)


A budget is defined for a scope based on pretty much whatever you want (apikey, api group, user, provider, model, request, etc). You can reference a budget in entity metadata using ai_budget_ref (apikey, api group, user, provider, auth. module)

Scope options
| Scope | Description |
|---|---|
apikeys | List of API key IDs |
users | List of user IDs |
groups | List of API group IDs |
providers | List of LLM provider IDs |
models | List of model names |
extractFromApikeyMeta | Extract budget ref from API key metadata |
extractFromApikeyGroupMeta | Extract budget ref from API key group metadata |
extractFromUserMeta | Extract budget ref from user metadata |
extractFromUserAuthModuleMeta | Extract budget ref from auth module metadata |
extractFromProviderMeta | Extract budget ref from provider metadata |
rules | JsonPath-based rules for advanced scoping |
rulesMatchMode | Rule matching mode: All (all rules must match) or Any (at least one rule must match) |
alwaysApplyRules | When true, rules are always evaluated even if explicit lists match |
Duration units
Budget limits are applied for a renewable duration between the start and end date:
| Unit | Description |
|---|---|
hour | Limits reset every hour |
day | Limits reset every day |
year | Limits reset every year |
Action on exceed
A budget can block requests on exceeded limits or just emit alerts.

| Parameter | Type | Default | Description |
|---|---|---|---|
mode | string | — | "soft" (alerts only) or "block" (deny requests with HTTP 429) |
alertOnExceed | boolean | — | Emit an alert when the budget is exceeded |
alertOnAlmostExceed | boolean | — | Emit an alert when the budget is almost exceeded |
alertOnAlmostExceedPercentage | number | 80 | Percentage threshold for "almost exceeded" alerts (e.g. 80 means alert at 80% consumption) |
Admin API
in addition to the classic admin API endpoints for the ai-gateway.extensions.cloud-apim.com/AiBudget entity, you have
- GET /api/extensions/cloud-apim/extensions/ai-extension/budgets/:id/consumption
- POST /api/extensions/cloud-apim/extensions/ai-extension/budgets/:id/consumption/_reset
the first one will return a response like:
{
"consumed_total_usd": 0.00188655,
"consumed_total_tokens": 8036,
"consumed_inference_usd": 0.00188655,
"consumed_inference_tokens": 8036,
"consumed_image_usd": 0,
"consumed_image_tokens": 0,
"consumed_audio_usd": 0,
"consumed_audio_tokens": 0,
"consumed_video_usd": 0,
"consumed_video_tokens": 0,
"consumed_embedding_usd": 0,
"consumed_embedding_tokens": 0,
"consumed_moderation_usd": 0,
"consumed_moderation_tokens": 0,
"remaining_total_usd": 199.99811345,
"remaining_total_tokens": 9991964,
"remaining_inference_usd": 0,
"remaining_inference_tokens": 0,
"remaining_image_usd": 0,
"remaining_image_tokens": 0,
"remaining_audio_usd": 0,
"remaining_audio_tokens": 0,
"remaining_video_usd": 0,
"remaining_video_tokens": 0,
"remaining_embedding_usd": 0,
"remaining_embedding_tokens": 0,
"remaining_moderation_usd": 0,
"remaining_moderation_tokens": 0,
"allowed_total_usd": 200,
"allowed_total_tokens": 10000000,
"allowed_inference_usd": null,
"allowed_inference_tokens": null,
"allowed_image_usd": null,
"allowed_image_tokens": null,
"allowed_audio_usd": null,
"allowed_audio_tokens": null,
"allowed_video_usd": null,
"allowed_video_tokens": null,
"allowed_embedding_usd": null,
"allowed_embedding_tokens": null,
"allowed_moderation_usd": null,
"allowed_moderation_tokens": null
}
Embed in responses
curl --request POST \
--url 'http://proxy.oto.tools:8080/v1/chat/completions?embed_budget=true' \
--header 'content-type: application/json' \
--data '{
"model": "anthropic/claude-sonnet-4-5-20250929",
"stream": false,
"messages": [
{
"role": "user",
"content": "tell me a joke ?"
}
]
}'
{
"id": "chatcmpl-9ODWHjtFVS5qzpU1nc79TprfRhZElq0e",
"object": "chat.completion",
"created": 1762441013,
"model": "claude-sonnet-4-5-20250929",
"system_fingerprint": "fp-tEQXbmpSPELEh3fYd5Rs9Jf4HIaI3E6s",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Why don't scientists trust atoms?\n\nBecause they make up everything! 😄"
},
"logprobs": null,
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 12,
"completion_tokens": 20,
"total_tokens": 32,
"completion_tokens_details": {
"reasoning_tokens": -1
}
},
"budget": {
"consumed_total_usd": 0.00394755,
"consumed_total_tokens": 8231,
"consumed_inference_usd": 0.00394755,
"consumed_inference_tokens": 8231,
"consumed_image_usd": 0,
"consumed_image_tokens": 0,
"consumed_audio_usd": 0,
"consumed_audio_tokens": 0,
"consumed_video_usd": 0,
"consumed_video_tokens": 0,
"consumed_embedding_usd": 0,
"consumed_embedding_tokens": 0,
"consumed_moderation_usd": 0,
"consumed_moderation_tokens": 0,
"remaining_total_usd": 199.99605245,
"remaining_total_tokens": 9991769,
"remaining_inference_usd": 0,
"remaining_inference_tokens": 0,
"remaining_image_usd": 0,
"remaining_image_tokens": 0,
"remaining_audio_usd": 0,
"remaining_audio_tokens": 0,
"remaining_video_usd": 0,
"remaining_video_tokens": 0,
"remaining_embedding_usd": 0,
"remaining_embedding_tokens": 0,
"remaining_moderation_usd": 0,
"remaining_moderation_tokens": 0,
"allowed_total_usd": 200,
"allowed_total_tokens": 10000000,
"allowed_inference_usd": null,
"allowed_inference_tokens": null,
"allowed_image_usd": null,
"allowed_image_tokens": null,
"allowed_audio_usd": null,
"allowed_audio_tokens": null,
"allowed_video_usd": null,
"allowed_video_tokens": null,
"allowed_embedding_usd": null,
"allowed_embedding_tokens": null,
"allowed_moderation_usd": null,
"allowed_moderation_tokens": null
}
}