📄️ Overview
Otoroshi LLM Extension provides a comprehensive set of cost optimization features to help you monitor, control, and reduce your LLM spending.
📄️ 💰 Cost tracking
Cost tracking for LLMs with a gateway means monitoring and managing the costs of using different LLMs through an API gateway.
📄️ 💰 Budgets Management
Budgets management for LLMs with a gateway means monitoring and managing the costs of using different LLMs through an API gateway.
📄️ Managing tokens usage
The LLM Tokens rate limiting plugin allows you to control token consumption per time window, preventing any single consumer from using more than their fair share of LLM resources.
📄️ Simple cache
The simple cache provides exact-match caching for LLM prompts. When an identical prompt (same messages, same roles, same content) is sent again within the TTL window, the cached response is returned instantly without calling the LLM provider.
📄️ Semantic cache
The semantic cache goes beyond exact matching: it uses embeddings to find prompts with the same semantic meaning, even when the wording is different. For example, "What's the weather in Paris?" and "Tell me the current weather for Paris" would match semantically.