Skip to main content

💰 Budgets Management

Budgets management for LLMs with a gateway means monitoring and managing the costs of using different LLMs through an API gateway.

Our Otoroshi LLM extension helps you optimize usage, control your budget, and improve cost efficiency across models.

Configuration

budgets {
enabled = true
enabled = ${?CLOUD_APIM_EXTENSIONS_LLM_EXTENSION_BUDGETS_ENABLED}
embed-budgets-in-responses = false
embed-budgets-in-responses = ${?CLOUD_APIM_EXTENSIONS_LLM_EXTENSION_BUDGETS_EMBED_BUGETS_IN_RESPONSES}
}

Once it's enabled, you can create budgets from the UI or the admin. API and enforce budget limits to any consumer

Budgets

A budget is defined between a start date and an end date. Limits are applied for a duration that is renewed between the start and end date. Limits can be expressed in USD or in tokens.

A budget can defined limits that can be global or per usage (inference, image, audio, video, embedding, moderation)

A budget is defined for a scope based on pretty much whatever you want (apikey, api group, user, provider, model, request, etc). You can reference a budget in entity metadata using ai_budget_ref (apikey, api group, user, provider, auth. module)

Scope options

ScopeDescription
apikeysList of API key IDs
usersList of user IDs
groupsList of API group IDs
providersList of LLM provider IDs
modelsList of model names
extractFromApikeyMetaExtract budget ref from API key metadata
extractFromApikeyGroupMetaExtract budget ref from API key group metadata
extractFromUserMetaExtract budget ref from user metadata
extractFromUserAuthModuleMetaExtract budget ref from auth module metadata
extractFromProviderMetaExtract budget ref from provider metadata
rulesJsonPath-based rules for advanced scoping
rulesMatchModeRule matching mode: All (all rules must match) or Any (at least one rule must match)
alwaysApplyRulesWhen true, rules are always evaluated even if explicit lists match

Duration units

Budget limits are applied for a renewable duration between the start and end date:

UnitDescription
hourLimits reset every hour
dayLimits reset every day
yearLimits reset every year

Action on exceed

A budget can block requests on exceeded limits or just emit alerts.

ParameterTypeDefaultDescription
modestring"soft" (alerts only) or "block" (deny requests with HTTP 429)
alertOnExceedbooleanEmit an alert when the budget is exceeded
alertOnAlmostExceedbooleanEmit an alert when the budget is almost exceeded
alertOnAlmostExceedPercentagenumber80Percentage threshold for "almost exceeded" alerts (e.g. 80 means alert at 80% consumption)

Admin API

in addition to the classic admin API endpoints for the ai-gateway.extensions.cloud-apim.com/AiBudget entity, you have

  • GET /api/extensions/cloud-apim/extensions/ai-extension/budgets/:id/consumption
  • POST /api/extensions/cloud-apim/extensions/ai-extension/budgets/:id/consumption/_reset

the first one will return a response like:

{
"consumed_total_usd": 0.00188655,
"consumed_total_tokens": 8036,
"consumed_inference_usd": 0.00188655,
"consumed_inference_tokens": 8036,
"consumed_image_usd": 0,
"consumed_image_tokens": 0,
"consumed_audio_usd": 0,
"consumed_audio_tokens": 0,
"consumed_video_usd": 0,
"consumed_video_tokens": 0,
"consumed_embedding_usd": 0,
"consumed_embedding_tokens": 0,
"consumed_moderation_usd": 0,
"consumed_moderation_tokens": 0,
"remaining_total_usd": 199.99811345,
"remaining_total_tokens": 9991964,
"remaining_inference_usd": 0,
"remaining_inference_tokens": 0,
"remaining_image_usd": 0,
"remaining_image_tokens": 0,
"remaining_audio_usd": 0,
"remaining_audio_tokens": 0,
"remaining_video_usd": 0,
"remaining_video_tokens": 0,
"remaining_embedding_usd": 0,
"remaining_embedding_tokens": 0,
"remaining_moderation_usd": 0,
"remaining_moderation_tokens": 0,
"allowed_total_usd": 200,
"allowed_total_tokens": 10000000,
"allowed_inference_usd": null,
"allowed_inference_tokens": null,
"allowed_image_usd": null,
"allowed_image_tokens": null,
"allowed_audio_usd": null,
"allowed_audio_tokens": null,
"allowed_video_usd": null,
"allowed_video_tokens": null,
"allowed_embedding_usd": null,
"allowed_embedding_tokens": null,
"allowed_moderation_usd": null,
"allowed_moderation_tokens": null
}

Embed in responses

curl --request POST \
--url 'http://proxy.oto.tools:8080/v1/chat/completions?embed_budget=true' \
--header 'content-type: application/json' \
--data '{
"model": "anthropic/claude-sonnet-4-5-20250929",
"stream": false,
"messages": [
{
"role": "user",
"content": "tell me a joke ?"
}
]
}'

{
"id": "chatcmpl-9ODWHjtFVS5qzpU1nc79TprfRhZElq0e",
"object": "chat.completion",
"created": 1762441013,
"model": "claude-sonnet-4-5-20250929",
"system_fingerprint": "fp-tEQXbmpSPELEh3fYd5Rs9Jf4HIaI3E6s",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Why don't scientists trust atoms?\n\nBecause they make up everything! 😄"
},
"logprobs": null,
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 12,
"completion_tokens": 20,
"total_tokens": 32,
"completion_tokens_details": {
"reasoning_tokens": -1
}
},
"budget": {
"consumed_total_usd": 0.00394755,
"consumed_total_tokens": 8231,
"consumed_inference_usd": 0.00394755,
"consumed_inference_tokens": 8231,
"consumed_image_usd": 0,
"consumed_image_tokens": 0,
"consumed_audio_usd": 0,
"consumed_audio_tokens": 0,
"consumed_video_usd": 0,
"consumed_video_tokens": 0,
"consumed_embedding_usd": 0,
"consumed_embedding_tokens": 0,
"consumed_moderation_usd": 0,
"consumed_moderation_tokens": 0,
"remaining_total_usd": 199.99605245,
"remaining_total_tokens": 9991769,
"remaining_inference_usd": 0,
"remaining_inference_tokens": 0,
"remaining_image_usd": 0,
"remaining_image_tokens": 0,
"remaining_audio_usd": 0,
"remaining_audio_tokens": 0,
"remaining_video_usd": 0,
"remaining_video_tokens": 0,
"remaining_embedding_usd": 0,
"remaining_embedding_tokens": 0,
"remaining_moderation_usd": 0,
"remaining_moderation_tokens": 0,
"allowed_total_usd": 200,
"allowed_total_tokens": 10000000,
"allowed_inference_usd": null,
"allowed_inference_tokens": null,
"allowed_image_usd": null,
"allowed_image_tokens": null,
"allowed_audio_usd": null,
"allowed_audio_tokens": null,
"allowed_video_usd": null,
"allowed_video_tokens": null,
"allowed_embedding_usd": null,
"allowed_embedding_tokens": null,
"allowed_moderation_usd": null,
"allowed_moderation_tokens": null
}
}