OpenAI Compatible API (unified)
The LLM OpenAI Compatible API plugin is a unified backend plugin that exposes a single Otoroshi route as a full-featured, multi-endpoint AI API. Instead of configuring separate plugins for chat completions, audio, images, embeddings, moderation, and responses, this single plugin handles all of them with a consistent configuration.
cp:otoroshi_plugins.com.cloud.apim.otoroshi.extensions.aigateway.plugins.OpenAiCompatApi
Why use this plugin?
When building an AI gateway, you typically need to expose several API endpoints: chat completions, embeddings, image generation, audio transcription, moderation, etc. With the individual plugins, you would need to configure each one separately on the same route, mapping each to a specific path.
The OpenAI Compatible API plugin simplifies this by providing a single plugin that routes requests based on the URL path to the appropriate handler. It also goes beyond the standard OpenAI API by supporting additional endpoints:
- Anthropic Messages API (
/messages) for clients like Claude Code - OpenAI Responses API (
/responses) for the newer responses format - Open Responses (
/open-responses) for the Open Responses specification - Prompt contexts (
/contexts) for listing available prompt contexts
This makes it the ideal choice when you want to expose a single, coherent API surface that supports multiple AI client ecosystems.
Supported endpoints
| Method | Path | Description | Config field |
|---|---|---|---|
GET | /models | List available models from all configured providers | language_model_refs |
GET | /contexts | List available prompt contexts | context_refs |
POST | /chat/completions | OpenAI Chat Completions API | language_model_refs |
POST | /responses | OpenAI Responses API (or Open Responses if configured) | language_model_refs |
POST | /open-responses | Open Responses API (always available) | language_model_refs |
POST | /oai-responses | OpenAI Responses API (always available) | language_model_refs |
POST | /messages | Anthropic Messages API | language_model_refs |
POST | /embeddings | OpenAI Embeddings API | embedding_model_refs |
POST | /images/generations | OpenAI Image Generation API | image_model_refs |
POST | /images/edits | OpenAI Image Edit API | image_model_refs |
POST | /audio/speech | OpenAI Text-to-Speech API | audio_model_refs |
POST | /audio/transcriptions | OpenAI Speech-to-Text API | audio_model_refs |
POST | /audio/translations | OpenAI Audio Translation API | audio_model_refs |
POST | /moderations | OpenAI Moderation API | moderation_model_refs |
Plugin configuration
{
"plugin": "cp:otoroshi_plugins.com.cloud.apim.otoroshi.extensions.aigateway.plugins.OpenAiCompatApi",
"enabled": true,
"config": {
"language_model_refs": ["provider_openai_1", "provider_mistral_1"],
"audio_model_refs": ["audio_model_1"],
"image_model_refs": ["image_model_1"],
"embedding_model_refs": ["embedding_model_1"],
"moderation_model_refs": ["moderation_model_1"],
"context_refs": ["context_1", "context_2"],
"max_size_upload": 104857600,
"decode_images": false,
"use_open_response_for_responses": false
}
}
| Parameter | Type | Default | Description |
|---|---|---|---|
language_model_refs | array of strings | [] | References to LLM provider entities used for chat completions, responses, and messages endpoints |
audio_model_refs | array of strings | [] | References to audio model entities used for speech and transcription endpoints |
image_model_refs | array of strings | [] | References to image model entities used for image generation and editing |
embedding_model_refs | array of strings | [] | References to embedding model entities used for the embeddings endpoint |
moderation_model_refs | array of strings | [] | References to moderation model entities used for the moderations endpoint |
context_refs | array of strings | [] | References to prompt context entities returned by the /contexts endpoint |
max_size_upload | number | 104857600 (100MB) | Maximum file upload size in bytes for audio and image endpoints |
decode_images | boolean | false | When enabled, decodes base64-encoded image results into binary responses |
use_open_response_for_responses | boolean | false | When enabled, the /responses endpoint uses the Open Responses proxy instead of the default OpenAI Responses proxy |
Route configuration example
{
"id": "route_unified_ai_api",
"name": "Unified AI API",
"frontend": {
"domains": ["api.domain.tld/v1"],
"strip_path": true,
"exact": false
},
"backend": {
"targets": [
{
"id": "target_1",
"hostname": "request.otoroshi.io",
"port": 443,
"tls": true
}
]
},
"plugins": [
{
"enabled": true,
"plugin": "cp:otoroshi.next.plugins.OverrideHost",
"config": {}
},
{
"enabled": true,
"plugin": "cp:otoroshi_plugins.com.cloud.apim.otoroshi.extensions.aigateway.plugins.OpenAiCompatApi",
"config": {
"language_model_refs": ["provider_openai_1", "provider_mistral_1"],
"embedding_model_refs": ["embedding_model_1"],
"image_model_refs": ["image_model_1"],
"audio_model_refs": ["audio_model_1"],
"context_refs": ["context_project_a"],
"use_open_response_for_responses": false
}
}
]
}
Endpoint details
GET /models
Lists all models available from the configured language model providers. Compatible with the OpenAI Models API.
curl https://api.domain.tld/v1/models \
-H "Authorization: Bearer $OTOROSHI_BEARER"
When multiple providers are configured, model IDs are prefixed with the provider slug name (e.g., my_openai/gpt-4o). Use ?raw=true to get raw model identifiers without prefixes.
GET /contexts
Returns the list of prompt contexts configured for this plugin. Each context includes its id and name.
curl https://api.domain.tld/v1/contexts \
-H "Authorization: Bearer $OTOROSHI_BEARER"
Response:
[
{ "id": "context_project_a", "name": "Project A context" },
{ "id": "context_support", "name": "Support context" }
]
POST /chat/completions
Standard OpenAI Chat Completions API endpoint. Supports streaming, tool calling, and model routing.
curl https://api.domain.tld/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $OTOROSHI_BEARER" \
-d '{
"model": "gpt-4o",
"messages": [
{ "role": "user", "content": "Hello!" }
]
}'
Model routing
When multiple providers are configured, you can target a specific provider using the model field:
- Slash syntax:
providerName/modelName(e.g.,my_openai/gpt-4o) - Hash syntax:
providerId###modelName(e.g.,provider_xxx###gpt-4o)
If no provider prefix is specified, the first configured provider is used.
Streaming
Streaming is activated when any of the following is true:
- The request body contains
"stream": true - The query parameter
?stream=trueis present - The header
x-stream: trueis present
POST /responses
OpenAI Responses API endpoint. This is the newer OpenAI API format that uses input and instructions instead of messages.
curl https://api.domain.tld/v1/responses \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $OTOROSHI_BEARER" \
-d '{
"model": "gpt-4o",
"instructions": "You are a helpful assistant.",
"input": "What is the capital of France?"
}'
Response:
{
"id": "resp_xxxxx",
"object": "response",
"created_at": 1711569952,
"model": "gpt-4o",
"status": "completed",
"output": [
{
"type": "message",
"id": "msg_xxxxx",
"status": "completed",
"role": "assistant",
"content": [
{
"type": "output_text",
"text": "The capital of France is Paris.",
"annotations": []
}
]
}
],
"usage": {
"input_tokens": 25,
"output_tokens": 8,
"total_tokens": 33
}
}
Input format
The input field accepts multiple formats:
-
String: a simple text message treated as a user message
{ "input": "Hello!" } -
Array of messages: standard role-based messages
{
"input": [
{ "type": "message", "role": "user", "content": "Hello!" }
]
} -
Multipart content: messages with mixed content types
{
"input": [
{
"type": "message",
"role": "user",
"content": [
{ "type": "input_text", "text": "What's in this image?" },
{ "type": "input_image", "image_url": "https://example.com/image.png" }
]
}
]
} -
Function call outputs: tool result messages
{
"input": [
{ "type": "function_call_output", "call_id": "call_123", "output": "Paris, 15 degrees" }
]
}
Streaming
When streaming is enabled, the response uses Server-Sent Events with the Responses API event protocol:
event: response.created
data: {"type":"response.created","response":{...}}
event: response.output_text.delta
data: {"type":"response.output_text.delta","item_id":"msg_xxx","output_index":0,"content_index":0,"delta":"The capital"}
event: response.output_text.delta
data: {"type":"response.output_text.delta","item_id":"msg_xxx","output_index":0,"content_index":0,"delta":" of France is Paris."}
event: response.completed
data: {"type":"response.completed","response":{...}}
The full lifecycle events are emitted: response.created, response.in_progress, response.output_item.added, response.content_part.added, response.output_text.delta, response.output_text.done, response.content_part.done, response.output_item.done, response.completed.
Using Open Responses instead
When use_open_response_for_responses is set to true, the /responses endpoint uses the Open Responses proxy implementation instead of the default one. The Open Responses proxy provides a richer implementation with native support for function calling, reasoning events, and more streaming events.
Regardless of this flag, the dedicated endpoints are always available:
/open-responsesalways uses the Open Responses proxy/oai-responsesalways uses the default OpenAI Responses proxy
POST /messages
Anthropic Messages API endpoint. This allows any Anthropic API client (including Claude Code) to use any LLM provider managed by Otoroshi.
curl https://api.domain.tld/v1/messages \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $OTOROSHI_BEARER" \
-H "anthropic-version: 2023-06-01" \
-d '{
"model": "gpt-4o",
"max_tokens": 1024,
"messages": [
{ "role": "user", "content": "Hello!" }
]
}'
Response (Anthropic format):
{
"id": "msg_xxxxx",
"type": "message",
"role": "assistant",
"model": "gpt-4o",
"content": [
{
"type": "text",
"text": "Hello! How can I assist you today?"
}
],
"stop_reason": "end_turn",
"usage": {
"input_tokens": 12,
"output_tokens": 10
}
}
The plugin handles all format translation automatically:
- Anthropic-format tools (
input_schema) are converted to OpenAI-format tools (parameters) tool_use/tool_resultmessages are translated bidirectionally- Top-level
systemfield is converted to a system message thinkingparameters are mapped toreasoning_effortmax_tokensis mapped tomax_completion_tokens- Streaming uses the full Anthropic SSE protocol
Using Claude Code
You can use Claude Code with this plugin by setting:
export ANTHROPIC_AUTH_TOKEN=your-otoroshi-api-key
export ANTHROPIC_API_KEY=""
export ANTHROPIC_BASE_URL=https://api.domain.tld
claude --model gpt-4o
POST /embeddings
OpenAI Embeddings API endpoint.
curl https://api.domain.tld/v1/embeddings \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $OTOROSHI_BEARER" \
-d '{
"model": "text-embedding-3-small",
"input": "The quick brown fox jumps over the lazy dog"
}'
POST /images/generations
OpenAI Image Generation API endpoint.
curl https://api.domain.tld/v1/images/generations \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $OTOROSHI_BEARER" \
-d '{
"model": "dall-e-3",
"prompt": "A white cat sitting on a windowsill",
"n": 1,
"size": "1024x1024"
}'
When decode_images is enabled, base64-encoded image results are decoded and returned as binary image data.
POST /images/edits
OpenAI Image Edit API endpoint. Accepts multipart/form-data uploads.
POST /audio/speech
OpenAI Text-to-Speech API endpoint.
curl https://api.domain.tld/v1/audio/speech \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $OTOROSHI_BEARER" \
-d '{
"model": "tts-1",
"input": "Hello, how are you?",
"voice": "alloy"
}' --output speech.mp3
POST /audio/transcriptions
OpenAI Speech-to-Text API endpoint. Accepts multipart/form-data audio file uploads up to the configured max_size_upload limit.
POST /audio/translations
OpenAI Audio Translation API endpoint. Translates audio into English text.
POST /moderations
OpenAI Moderation API endpoint.
curl https://api.domain.tld/v1/moderations \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $OTOROSHI_BEARER" \
-d '{
"input": "Some text to check for moderation"
}'
Securing the API
Since this plugin exposes a powerful multi-endpoint AI API, you will want to secure it properly. Otoroshi provides several authentication mechanisms that can be combined on the same route. The key idea is that all security plugins run before the backend call, so you can stack multiple authentication methods and let the first valid one through.
Below are three common approaches, from simplest to most enterprise-ready. They can be used individually or combined on the same route.
API Keys
The simplest approach. Otoroshi API keys support multiple extraction methods (header, query param, Basic auth, Bearer token, JWT) and come with built-in quotas and rate limiting.
Add the ApikeyCalls plugin to your route:
{
"enabled": true,
"plugin": "cp:otoroshi.next.plugins.ApikeyCalls",
"config": {
"extractors": {
"basic": { "enabled": true },
"custom_headers": { "enabled": true },
"client_id": { "enabled": true },
"jwt": { "enabled": true, "secret_signed": true, "keypair_signed": true },
"oto_bearer": { "enabled": true }
},
"validate": true,
"mandatory": true,
"wipe_backend_request": true,
"update_quotas": true
}
}
Clients can then authenticate with a Bearer token:
curl https://api.domain.tld/v1/chat/completions \
-H "Authorization: Bearer your-otoroshi-apikey" \
-H "Content-Type: application/json" \
-d '{"model": "gpt-4o", "messages": [{"role": "user", "content": "Hello!"}]}'
Or with Basic auth, client ID/secret headers, or JWT tokens signed with the API key secret.
You can further restrict access using mandatory tags to ensure only API keys with specific tags can access this route:
{
"enabled": true,
"plugin": "cp:otoroshi.next.plugins.NgApikeyMandatoryTags",
"config": {
"tags": ["endpoint_ai-api"]
}
}
Biscuit tokens
Biscuit tokens provide fine-grained, decentralized authorization with attenuation capabilities. This is ideal for distributing scoped tokens to different teams or applications.
Add the BiscuitUserExtractor plugin:
{
"enabled": true,
"plugin": "cp:otoroshi_plugins.com.cloud.apim.otoroshi.extensions.biscuit.plugins.BiscuitUserExtractor",
"config": {
"keypair_ref": "biscuit-keypair_your-keypair-id",
"enforce": true,
"extractor_type": "header",
"extractor_name": "Authorization",
"validations": {
"policies": [
"allow if endpoint(\"ai-api\")"
]
}
}
}
Clients send their biscuit token in the Authorization header:
curl https://api.domain.tld/v1/chat/completions \
-H "Authorization: biscuit_token_here" \
-H "Content-Type: application/json" \
-d '{"model": "gpt-4o", "messages": [{"role": "user", "content": "Hello!"}]}'
The biscuit policy allow if endpoint("ai-api") ensures the token was explicitly granted access to this endpoint. You can create attenuated tokens with additional restrictions (time limits, IP ranges, specific models, etc.).
OIDC / JWT (Keycloak, Auth0, etc.)
For enterprise environments with an existing identity provider, you can validate JWT tokens issued by any OIDC-compliant provider (Keycloak, Auth0, Okta, Azure AD, etc.).
Add the OIDCJwtVerifier plugin:
{
"enabled": true,
"plugin": "cp:otoroshi.next.plugins.OIDCJwtVerifier",
"config": {
"ref": "auth_mod_your-oidc-verifier-id",
"mandatory": true,
"user": true
}
}
The ref points to an Otoroshi JWT verifier entity configured with your OIDC provider's JWKS URL and issuer. Clients send their JWT token as a Bearer token:
# First, get a token from your OIDC provider
TOKEN=$(curl -s -X POST https://keycloak.your-domain.com/realms/your-realm/protocol/openid-connect/token \
-d "grant_type=client_credentials" \
-d "client_id=ai-api-client" \
-d "client_secret=your-client-secret" | jq -r '.access_token')
# Then call the AI API
curl https://api.domain.tld/v1/chat/completions \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{"model": "gpt-4o", "messages": [{"role": "user", "content": "Hello!"}]}'
Combining multiple authentication methods
The real power of Otoroshi is the ability to combine these methods on the same route. By setting mandatory: false on each auth plugin and adding the NgExpectedConsumer plugin, you can accept any of the configured authentication methods:
{
"plugins": [
{
"enabled": true,
"plugin": "cp:otoroshi_plugins.com.cloud.apim.otoroshi.extensions.aigateway.plugins.OpenAiCompatApi",
"config": {
"language_model_refs": ["provider_openai_1"],
"embedding_model_refs": ["embedding_model_1"]
}
},
{
"enabled": true,
"plugin": "cp:otoroshi.next.plugins.ApikeyCalls",
"config": {
"extractors": {
"basic": { "enabled": true },
"custom_headers": { "enabled": true },
"jwt": { "enabled": true, "secret_signed": true, "keypair_signed": true },
"oto_bearer": { "enabled": true }
},
"validate": true,
"mandatory": false,
"wipe_backend_request": true,
"update_quotas": true
}
},
{
"enabled": true,
"plugin": "cp:otoroshi_plugins.com.cloud.apim.otoroshi.extensions.biscuit.plugins.BiscuitUserExtractor",
"config": {
"keypair_ref": "biscuit-keypair_your-keypair-id",
"enforce": false,
"extractor_type": "header",
"extractor_name": "Authorization",
"validations": {
"policies": ["allow if endpoint(\"ai-api\")"]
}
}
},
{
"enabled": true,
"plugin": "cp:otoroshi.next.plugins.OIDCJwtVerifier",
"config": {
"ref": "auth_mod_your-oidc-verifier-id",
"mandatory": false,
"user": true
}
},
{
"enabled": true,
"plugin": "cp:otoroshi.next.plugins.NgExpectedConsumer",
"config": {}
}
]
}
With this setup:
- A developer can use a simple API key for quick prototyping
- An automated pipeline can use a scoped biscuit token with fine-grained permissions
- A web application can use a JWT token from your corporate Keycloak/Auth0
- The
NgExpectedConsumerplugin ensures that at least one authentication method succeeded
All three methods work on the same route, on the same API surface, without any change to the AI plugin configuration.
Comparison with individual plugins
| Feature | Individual plugins | OpenAI Compatible API |
|---|---|---|
| Setup complexity | One plugin per endpoint | Single plugin for everything |
| Path-based routing | Manual with includes filters | Automatic based on URL path |
| Anthropic Messages support | Separate AnthropicCompatProxy plugin | Built-in via /messages |
| Responses API support | Separate OpenAiResponsesProxy plugin | Built-in via /responses |
| Open Responses support | Separate OpenResponseCompatProxy plugin | Built-in via /open-responses |
| Prompt contexts | Not available | Built-in via /contexts |
| Model-specific config | Same refs for all endpoints | Separate refs per capability (language, audio, image, embedding, moderation) |
| Flexibility | Fine-grained control per endpoint | Unified but less granular |
The individual plugins are still useful when you need fine-grained control per endpoint (e.g., different provider refs for different paths, or different plugin chains per endpoint). The unified plugin is ideal when you want a quick, coherent API surface with minimal configuration.