💰 Budgets Management

Budgets management for LLMs with a gateway means monitoring and managing the costs of using different LLMs through an API gateway.

Our Otoroshi LLM extension helps you optimize usage, control your budget, and improve cost efficiency across models.

Configuration

budgets {
  enabled = true
  enabled = ${?CLOUD_APIM_EXTENSIONS_LLM_EXTENSION_BUDGETS_ENABLED}
  embed-budgets-in-responses = false
  embed-budgets-in-responses = ${?CLOUD_APIM_EXTENSIONS_LLM_EXTENSION_BUDGETS_EMBED_BUGETS_IN_RESPONSES}
}

Once it's enabled, you can create budgets from the UI or the admin. API and enforce budget limits to any consumer

Budgets

A budget is defined between a start date and an end date. Limits are applied for a duration that is renewed between the start and end date. Limits can be expressed in USD or in tokens.

A budget can defined limits that can be global or per usage (inference, image, audio, video, embedding, moderation)

A budget is defined for a scope based on pretty much whatever you want (apikey, api group, user, provider, model, request, etc). You can reference a budget in entity metadata using ai_budget_ref (apikey, api group, user, provider, auth. module)

Scope options

Scope	Description
`apikeys`	List of API key IDs
`users`	List of user IDs
`groups`	List of API group IDs
`providers`	List of LLM provider IDs
`models`	List of model names
`extractFromApikeyMeta`	Extract budget ref from API key metadata
`extractFromApikeyGroupMeta`	Extract budget ref from API key group metadata
`extractFromUserMeta`	Extract budget ref from user metadata
`extractFromUserAuthModuleMeta`	Extract budget ref from auth module metadata
`extractFromProviderMeta`	Extract budget ref from provider metadata
`rules`	JsonPath-based rules for advanced scoping
`rulesMatchMode`	Rule matching mode: `All` (all rules must match) or `Any` (at least one rule must match)
`alwaysApplyRules`	When `true`, rules are always evaluated even if explicit lists match

Duration units

Budget limits are applied for a renewable duration between the start and end date:

Unit	Description
`hour`	Limits reset every hour
`day`	Limits reset every day
`year`	Limits reset every year

Action on exceed

A budget can block requests on exceeded limits or just emit alerts.

Parameter	Type	Default	Description
`mode`	string	—	`"soft"` (alerts only) or `"block"` (deny requests with HTTP 429)
`alertOnExceed`	boolean	—	Emit an alert when the budget is exceeded
`alertOnAlmostExceed`	boolean	—	Emit an alert when the budget is almost exceeded
`alertOnAlmostExceedPercentage`	number	`80`	Percentage threshold for "almost exceeded" alerts (e.g. 80 means alert at 80% consumption)

Admin API

in addition to the classic admin API endpoints for the ai-gateway.extensions.cloud-apim.com/AiBudget entity, you have

GET /api/extensions/cloud-apim/extensions/ai-extension/budgets/:id/consumption
POST /api/extensions/cloud-apim/extensions/ai-extension/budgets/:id/consumption/_reset

the first one will return a response like:

{
  "consumed_total_usd": 0.00188655,
  "consumed_total_tokens": 8036,
  "consumed_inference_usd": 0.00188655,
  "consumed_inference_tokens": 8036,
  "consumed_image_usd": 0,
  "consumed_image_tokens": 0,
  "consumed_audio_usd": 0,
  "consumed_audio_tokens": 0,
  "consumed_video_usd": 0,
  "consumed_video_tokens": 0,
  "consumed_embedding_usd": 0,
  "consumed_embedding_tokens": 0,
  "consumed_moderation_usd": 0,
  "consumed_moderation_tokens": 0,
  "remaining_total_usd": 199.99811345,
  "remaining_total_tokens": 9991964,
  "remaining_inference_usd": 0,
  "remaining_inference_tokens": 0,
  "remaining_image_usd": 0,
  "remaining_image_tokens": 0,
  "remaining_audio_usd": 0,
  "remaining_audio_tokens": 0,
  "remaining_video_usd": 0,
  "remaining_video_tokens": 0,
  "remaining_embedding_usd": 0,
  "remaining_embedding_tokens": 0,
  "remaining_moderation_usd": 0,
  "remaining_moderation_tokens": 0,
  "allowed_total_usd": 200,
  "allowed_total_tokens": 10000000,
  "allowed_inference_usd": null,
  "allowed_inference_tokens": null,
  "allowed_image_usd": null,
  "allowed_image_tokens": null,
  "allowed_audio_usd": null,
  "allowed_audio_tokens": null,
  "allowed_video_usd": null,
  "allowed_video_tokens": null,
  "allowed_embedding_usd": null,
  "allowed_embedding_tokens": null,
  "allowed_moderation_usd": null,
  "allowed_moderation_tokens": null
}

Embed in responses

curl --request POST \
  --url 'http://proxy.oto.tools:8080/v1/chat/completions?embed_budget=true' \
  --header 'content-type: application/json' \
  --data '{
  "model": "anthropic/claude-sonnet-4-5-20250929",
  "stream": false,
  "messages": [
    {
      "role": "user",
      "content": "tell me a joke ?"
    }
  ]
}'

{
  "id": "chatcmpl-9ODWHjtFVS5qzpU1nc79TprfRhZElq0e",
  "object": "chat.completion",
  "created": 1762441013,
  "model": "claude-sonnet-4-5-20250929",
  "system_fingerprint": "fp-tEQXbmpSPELEh3fYd5Rs9Jf4HIaI3E6s",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Why don't scientists trust atoms?\n\nBecause they make up everything! 😄"
      },
      "logprobs": null,
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 12,
    "completion_tokens": 20,
    "total_tokens": 32,
    "completion_tokens_details": {
      "reasoning_tokens": -1
    }
  },
  "budget": {
    "consumed_total_usd": 0.00394755,
    "consumed_total_tokens": 8231,
    "consumed_inference_usd": 0.00394755,
    "consumed_inference_tokens": 8231,
    "consumed_image_usd": 0,
    "consumed_image_tokens": 0,
    "consumed_audio_usd": 0,
    "consumed_audio_tokens": 0,
    "consumed_video_usd": 0,
    "consumed_video_tokens": 0,
    "consumed_embedding_usd": 0,
    "consumed_embedding_tokens": 0,
    "consumed_moderation_usd": 0,
    "consumed_moderation_tokens": 0,
    "remaining_total_usd": 199.99605245,
    "remaining_total_tokens": 9991769,
    "remaining_inference_usd": 0,
    "remaining_inference_tokens": 0,
    "remaining_image_usd": 0,
    "remaining_image_tokens": 0,
    "remaining_audio_usd": 0,
    "remaining_audio_tokens": 0,
    "remaining_video_usd": 0,
    "remaining_video_tokens": 0,
    "remaining_embedding_usd": 0,
    "remaining_embedding_tokens": 0,
    "remaining_moderation_usd": 0,
    "remaining_moderation_tokens": 0,
    "allowed_total_usd": 200,
    "allowed_total_tokens": 10000000,
    "allowed_inference_usd": null,
    "allowed_inference_tokens": null,
    "allowed_image_usd": null,
    "allowed_image_tokens": null,
    "allowed_audio_usd": null,
    "allowed_audio_tokens": null,
    "allowed_video_usd": null,
    "allowed_video_tokens": null,
    "allowed_embedding_usd": null,
    "allowed_embedding_tokens": null,
    "allowed_moderation_usd": null,
    "allowed_moderation_tokens": null
  }
}

💰 Budgets Management

Configuration​

Budgets​

Scope options​

Duration units​

Action on exceed​

Admin API​

Embed in responses​