Overview
Otoroshi LLM Extension is a set of Otoroshi plugins and resources to interact with LLMs, let's discover it!
Connect, setup, secure and seamlessly manage LLM models using a Universal/OpenAI compatible API
LLM Gateway
- Unified interface: Simplify interactions and minimize integration hassles with a single, consistent API across all providers
- 50+ LLM providers: Connect to OpenAI, Anthropic, Mistral, Gemini, Azure, Ollama, and many more with a unified configuration
- Load balancing: Ensure optimal performance by distributing workloads across multiple providers
- Fallbacks: Automatically switch LLMs during failures to deliver uninterrupted and accurate performance
- Automatic retries: LLM APIs often have inexplicable failures. Rescue a substantial number of your requests with built-in automatic retries
- Key vault: Securely store your LLM API keys in Otoroshi vault or any other secret vault supported by Otoroshi
- Fine grained authorizations: Constrain model usage based on user identity, apikey, consumer metadata, request details, and more
Cost optimization
- Cost tracking: Track per-model costs in real time with automatic pricing from LiteLLM
- Budgets: Set spending limits per provider with configurable durations, scopes, and alert modes
- Token quotas: Manage LLM token rate limits per consumer and per time window to optimise costs
- Simple cache: Exact-match caching based on SHA-512 hashing to avoid duplicate LLM calls, with optional Redis backend for cluster-wide sharing
- Semantic cache: Speed up repeated queries using embedding-based similarity matching, with optional custom embedding models and Redis Stack backend
Observability and reporting
- Observability: Every LLM request is audited with details about the consumer, the LLM provider, and usage
- Reporting: Export audit events using multiple methods for dashboards and analytics
- Ecological impacts: Track the environmental footprint of your LLM usage with electricity mix zones and impact metrics
Guardrails
- Guardrails: Validate prompts and responses to prevent sensitive data leakage, prompt injection, toxic language, bias, gibberish content, and more with 20+ built-in validation rules
Prompt engineering
- Prompt engineering: Enhance your experience by providing contextual information to your prompts, storing them in a library for reusability, and using prompt templates for increased efficiency
Function calling
- Function calling (tool calls): Extend LLM capabilities with tool calling using 5 backend kinds: QuickJS (JavaScript), WASM plugins, HTTP endpoints, Workflows, and Routes
MCP (Model Context Protocol)
- MCP: Connect to external MCP servers for tools, resources, and prompts, and expose your own MCP servers via HTTP, SSE, or WebSocket
Embeddings
- Embeddings models: Compute text embeddings with an OpenAI-compatible API, supporting 16+ providers including a local AllMiniLM L6 V2 model
- Embedding stores: Store and search embedding vectors with 9 backends: local, ChromaDB, Elasticsearch, OpenSearch, Qdrant, Weaviate, Pinecone, Redis Stack, and PostgreSQL (pgvector)
Multi-modal models
- Audio models: Text-to-speech, speech-to-text, and audio translation capabilities
- Image models: Image generation and editing with multiple providers
- Video models: Video generation capabilities
Moderation
- Moderation models: Content moderation with dedicated models and plugins to filter inappropriate content
Workflows and AI Agents
- Workflows: Build complex multi-step AI pipelines with 17+ built-in functions for LLM calls, audio, image/video generation, embeddings, vector stores, memory, guardrails, and more
- AI Agents: Build agentic workflows with tool calling, agent handoffs, persistent memory, LLM-based routing, MCP integration, document conversion (Kreuzberg), and autonomous persistent KV memory backed by Redis or PostgreSQL
HTTP + LLM Plugins
- HTTP + LLM Plugins: Use LLMs to modify HTTP request/response bodies, validate requests, generate responses, and more
- Content to Markdown: Convert documents (PDF, DOCX, HTML, images) to markdown using Kreuzberg — available as an HTTP plugin, workflow function, and agent built-in tool (requires Java 25)
Supported LLM providers
All supported LLM providers are listed here
- Anthropic
- Azure OpenAI
- Azure AI Foundry
- Cloud Temple 🇫🇷 🇪🇺
- Cloudflare
- Cohere
- Deepseek
- Gemini
- Groq
- Huggingface 🇫🇷 🇪🇺
- Mistral 🇫🇷 🇪🇺
- Ollama (Local Models)
- OpenAI
- OVH AI Endpoints 🇫🇷 🇪🇺
- Scaleway 🇫🇷 🇪🇺
- X.ai
And 37 more including Abliteration, AI/ML API, Apertis, AssemblyAI, Cerebras, Chutes, CometAPI, CompactifAI, DeepInfra, Empower, Featherless AI, Fireworks AI, Friendli AI, Galadriel, GMI, Helicone, Hyperbolic, Lambda AI, LlamaGate, Meta Llama API, Minimax, Morph, Nano GPT, Nebius AI Studio, Novita AI, Nscale, Nvidia NIM, OpenRouter, Perplexity, Poe, SambaNova, Sarvam, Synthetic, Together AI, Venice AI, Xiaomi Mimo, Z.AI