OpenAI Compatible API (unified)

The LLM OpenAI Compatible API plugin is a unified backend plugin that exposes a single Otoroshi route as a full-featured, multi-endpoint AI API. Instead of configuring separate plugins for chat completions, audio, images, embeddings, moderation, and responses, this single plugin handles all of them with a consistent configuration.

cp:otoroshi_plugins.com.cloud.apim.otoroshi.extensions.aigateway.plugins.OpenAiCompatApi

Why use this plugin?

When building an AI gateway, you typically need to expose several API endpoints: chat completions, embeddings, image generation, audio transcription, moderation, etc. With the individual plugins, you would need to configure each one separately on the same route, mapping each to a specific path.

The OpenAI Compatible API plugin simplifies this by providing a single plugin that routes requests based on the URL path to the appropriate handler. It also goes beyond the standard OpenAI API by supporting additional endpoints:

Anthropic Messages API (/messages) for clients like Claude Code
OpenAI Responses API (/responses) for the newer responses format
Open Responses (/open-responses) for the Open Responses specification
Prompt contexts (/contexts) for listing available prompt contexts

This makes it the ideal choice when you want to expose a single, coherent API surface that supports multiple AI client ecosystems.

Supported endpoints

Method	Path	Description	Config field
`GET`	`/models`	List available models from all configured providers	`language_model_refs`
`GET`	`/providers`	List every provider type the gateway can talk to, with their capabilities	—
`GET`	`/model-capabilities`	List the model types (modalities) the gateway supports	—
`GET`	`/contexts`	List available prompt contexts	`context_refs`
`POST`	`/chat/completions`	OpenAI Chat Completions API	`language_model_refs`
`POST`	`/responses`	OpenAI Responses API (or Open Responses if configured)	`language_model_refs`
`POST`	`/open-responses`	Open Responses API (always available)	`language_model_refs`
`POST`	`/oai-responses`	OpenAI Responses API (always available)	`language_model_refs`
`POST`	`/messages`	Anthropic Messages API	`language_model_refs`
`POST`	`/embeddings`	OpenAI Embeddings API	`embedding_model_refs`
`POST`	`/images/generations`	OpenAI Image Generation API	`image_model_refs`
`POST`	`/images/edits`	OpenAI Image Edit API	`image_model_refs`
`POST`	`/audio/speech`	OpenAI Text-to-Speech API	`audio_model_refs`
`POST`	`/audio/transcriptions`	OpenAI Speech-to-Text API	`audio_model_refs`
`POST`	`/audio/translations`	OpenAI Audio Translation API	`audio_model_refs`
`POST`	`/moderations`	OpenAI Moderation API	`moderation_model_refs`
`POST`	`/ocr`	OCR (text extraction from images and pdf)	`ocr_model_refs`

Plugin configuration

{
  "plugin": "cp:otoroshi_plugins.com.cloud.apim.otoroshi.extensions.aigateway.plugins.OpenAiCompatApi",
  "enabled": true,
  "config": {
    "language_model_refs": ["provider_openai_1", "provider_mistral_1"],
    "audio_model_refs": ["audio_model_1"],
    "image_model_refs": ["image_model_1"],
    "ocr_model_refs": ["ocr_model_1"],
    "embedding_model_refs": ["embedding_model_1"],
    "moderation_model_refs": ["moderation_model_1"],
    "context_refs": ["context_1", "context_2"],
    "max_size_upload": 104857600,
    "decode_images": false,
    "use_open_response_for_responses": false
  }
}

Parameter	Type	Default	Description
`language_model_refs`	array of strings	`[]`	References to LLM provider entities used for chat completions, responses, and messages endpoints
`audio_model_refs`	array of strings	`[]`	References to audio model entities used for speech and transcription endpoints
`image_model_refs`	array of strings	`[]`	References to image model entities used for image generation and editing
`embedding_model_refs`	array of strings	`[]`	References to embedding model entities used for the embeddings endpoint
`moderation_model_refs`	array of strings	`[]`	References to moderation model entities used for the moderations endpoint
`ocr_model_refs`	array of strings	`[]`	References to OCR model entities used for the `/ocr` endpoint
`context_refs`	array of strings	`[]`	References to prompt context entities returned by the `/contexts` endpoint
`max_size_upload`	number	`104857600` (100MB)	Maximum file upload size in bytes for audio, image, and OCR endpoints
`decode_images`	boolean	`false`	When enabled, decodes base64-encoded image results into binary responses
`use_open_response_for_responses`	boolean	`false`	When enabled, the `/responses` endpoint uses the Open Responses proxy instead of the default OpenAI Responses proxy

Route configuration example

{
  "id": "route_unified_ai_api",
  "name": "Unified AI API",
  "frontend": {
    "domains": ["api.domain.tld/v1"],
    "strip_path": true,
    "exact": false
  },
  "backend": {
    "targets": [
      {
        "id": "target_1",
        "hostname": "request.otoroshi.io",
        "port": 443,
        "tls": true
      }
    ]
  },
  "plugins": [
    {
      "enabled": true,
      "plugin": "cp:otoroshi.next.plugins.OverrideHost",
      "config": {}
    },
    {
      "enabled": true,
      "plugin": "cp:otoroshi_plugins.com.cloud.apim.otoroshi.extensions.aigateway.plugins.OpenAiCompatApi",
      "config": {
        "language_model_refs": ["provider_openai_1", "provider_mistral_1"],
        "embedding_model_refs": ["embedding_model_1"],
        "image_model_refs": ["image_model_1"],
        "audio_model_refs": ["audio_model_1"],
        "context_refs": ["context_project_a"],
        "use_open_response_for_responses": false
      }
    }
  ]
}

Endpoint details

GET /models

Lists all models available from the configured language model providers. Compatible with the OpenAI Models API.

curl https://api.domain.tld/v1/models \
  -H "Authorization: Bearer $OTOROSHI_BEARER"

When multiple providers are configured, model IDs are prefixed with the provider slug name (e.g., my_openai/gpt-4o). Use ?raw=true to get raw model identifiers without prefixes.

GET /contexts

Returns the list of prompt contexts configured for this plugin. Each context includes its id and name.

curl https://api.domain.tld/v1/contexts \
  -H "Authorization: Bearer $OTOROSHI_BEARER"

Response:

[
  { "id": "context_project_a", "name": "Project A context" },
  { "id": "context_support", "name": "Support context" }
]

GET /providers

Returns the catalog of every provider the gateway can connect to, each with the capabilities (modalities) it supports. This is a great way to build a provider picker in your own UI, or to discover programmatically which providers can handle a given task — text, audio, image, OCR, embeddings, moderation or video.

curl https://api.domain.tld/v1/providers \
  -H "Authorization: Bearer $OTOROSHI_BEARER"

Response:

{
  "object": "list",
  "data": [
    { "id": "openai", "label": "OpenAI", "capabilities": ["text", "audio", "image", "embedding", "moderation"] },
    { "id": "anthropic", "label": "Anthropic", "capabilities": ["text"] },
    { "id": "mistral", "label": "Mistral", "capabilities": ["text", "audio", "ocr", "embedding", "moderation"] },
    { "id": "ovh-ai-endpoints", "label": "OVH AI Endpoints", "capabilities": ["text", "audio", "image", "embedding", "moderation"] }
  ]
}

Filtering by capability

Pass one or more capabilities query parameters to keep only the providers that expose all of them. The parameter is repeatable and also accepts a comma-separated list, so the two calls below are equivalent:

curl "https://api.domain.tld/v1/providers?capabilities=image&capabilities=text" \
  -H "Authorization: Bearer $OTOROSHI_BEARER"

curl "https://api.domain.tld/v1/providers?capabilities=image,text" \
  -H "Authorization: Bearer $OTOROSHI_BEARER"

This returns only the providers that can do both image and text.

The same catalog is also available as a standalone plugin you can put on its own route:

cp:otoroshi_plugins.com.cloud.apim.otoroshi.extensions.aigateway.plugins.LlmProvidersCatalog

GET /model-capabilities

Returns the model types (modalities) the gateway supports — text, audio, image, ocr, embedding, moderation and video — each with the list of providers that expose it. It is the mirror image of /providers: where /providers answers "what can this provider do?", /model-capabilities answers "which providers can do this?".

curl https://api.domain.tld/v1/model-capabilities \
  -H "Authorization: Bearer $OTOROSHI_BEARER"

Response:

{
  "object": "list",
  "data": [
    { "id": "text", "label": "Text", "providers": ["anthropic", "openai", "mistral", "..."] },
    { "id": "audio", "label": "Audio", "providers": ["openai", "groq", "elevenlabs", "mistral", "ovh-ai-endpoints", "..."] },
    { "id": "image", "label": "Image", "providers": ["openai", "gemini", "luma", "ovh-ai-endpoints", "..."] },
    { "id": "ocr", "label": "OCR", "providers": ["alphaedge", "mistral"] },
    { "id": "embedding", "label": "Embedding", "providers": ["openai", "mistral", "cohere", "ovh-ai-endpoints", "..."] },
    { "id": "moderation", "label": "Moderation", "providers": ["openai", "mistral", "ovh-ai-endpoints"] },
    { "id": "video", "label": "Video", "providers": ["luma", "openrouter"] }
  ]
}

The same list is also available as a standalone plugin you can put on its own route:

cp:otoroshi_plugins.com.cloud.apim.otoroshi.extensions.aigateway.plugins.LlmModelCapabilities

Also available on the Otoroshi admin API

Both catalogs are exposed on the Otoroshi admin API as well (authenticated with an admin API key), so you can query them without exposing a route:

GET /api/extensions/cloud-apim/extensions/ai-extension/providers
GET /api/extensions/cloud-apim/extensions/ai-extension/model-capabilities

curl "https://otoroshi-api.domain.tld/api/extensions/cloud-apim/extensions/ai-extension/providers?capabilities=image,text" \
  -u "$ADMIN_APIKEY_ID:$ADMIN_APIKEY_SECRET"

The capabilities filter behaves exactly like on /providers above, and the response shape is identical.

POST /chat/completions

Standard OpenAI Chat Completions API endpoint. Supports streaming, tool calling, and model routing.

curl https://api.domain.tld/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $OTOROSHI_BEARER" \
  -d '{
    "model": "gpt-4o",
    "messages": [
      { "role": "user", "content": "Hello!" }
    ]
  }'

Model routing

When multiple providers are configured, you can target a specific provider using the model field:

Slash syntax: providerName/modelName (e.g., my_openai/gpt-4o)
Hash syntax: providerId###modelName (e.g., provider_xxx###gpt-4o)

If no provider prefix is specified, the first configured provider is used.

Streaming

Streaming is activated when any of the following is true:

The request body contains "stream": true
The query parameter ?stream=true is present
The header x-stream: true is present

POST /responses

OpenAI Responses API endpoint. This is the newer OpenAI API format that uses input and instructions instead of messages.

curl https://api.domain.tld/v1/responses \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $OTOROSHI_BEARER" \
  -d '{
    "model": "gpt-4o",
    "instructions": "You are a helpful assistant.",
    "input": "What is the capital of France?"
  }'

Response:

{
  "id": "resp_xxxxx",
  "object": "response",
  "created_at": 1711569952,
  "model": "gpt-4o",
  "status": "completed",
  "output": [
    {
      "type": "message",
      "id": "msg_xxxxx",
      "status": "completed",
      "role": "assistant",
      "content": [
        {
          "type": "output_text",
          "text": "The capital of France is Paris.",
          "annotations": []
        }
      ]
    }
  ],
  "usage": {
    "input_tokens": 25,
    "output_tokens": 8,
    "total_tokens": 33
  }
}

Input format

The input field accepts multiple formats:

String: a simple text message treated as a user message
```
{ "input": "Hello!" }
```

Array of messages: standard role-based messages

{
  "input": [
    { "type": "message", "role": "user", "content": "Hello!" }
  ]
}

Multipart content: messages with mixed content types

{
  "input": [
    {
      "type": "message",
      "role": "user",
      "content": [
        { "type": "input_text", "text": "What's in this image?" },
        { "type": "input_image", "image_url": "https://example.com/image.png" }
      ]
    }
  ]
}

Function call outputs: tool result messages

{
  "input": [
    { "type": "function_call_output", "call_id": "call_123", "output": "Paris, 15 degrees" }
  ]
}

Streaming

When streaming is enabled, the response uses Server-Sent Events with the Responses API event protocol:

event: response.created
data: {"type":"response.created","response":{...}}

event: response.output_text.delta
data: {"type":"response.output_text.delta","item_id":"msg_xxx","output_index":0,"content_index":0,"delta":"The capital"}

event: response.output_text.delta
data: {"type":"response.output_text.delta","item_id":"msg_xxx","output_index":0,"content_index":0,"delta":" of France is Paris."}

event: response.completed
data: {"type":"response.completed","response":{...}}

The full lifecycle events are emitted: response.created, response.in_progress, response.output_item.added, response.content_part.added, response.output_text.delta, response.output_text.done, response.content_part.done, response.output_item.done, response.completed.

Using Open Responses instead

When use_open_response_for_responses is set to true, the /responses endpoint uses the Open Responses proxy implementation instead of the default one. The Open Responses proxy provides a richer implementation with native support for function calling, reasoning events, and more streaming events.

Regardless of this flag, the dedicated endpoints are always available:

/open-responses always uses the Open Responses proxy
/oai-responses always uses the default OpenAI Responses proxy

POST /messages

Anthropic Messages API endpoint. This allows any Anthropic API client (including Claude Code) to use any LLM provider managed by Otoroshi.

curl https://api.domain.tld/v1/messages \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $OTOROSHI_BEARER" \
  -H "anthropic-version: 2023-06-01" \
  -d '{
    "model": "gpt-4o",
    "max_tokens": 1024,
    "messages": [
      { "role": "user", "content": "Hello!" }
    ]
  }'

Response (Anthropic format):

{
  "id": "msg_xxxxx",
  "type": "message",
  "role": "assistant",
  "model": "gpt-4o",
  "content": [
    {
      "type": "text",
      "text": "Hello! How can I assist you today?"
    }
  ],
  "stop_reason": "end_turn",
  "usage": {
    "input_tokens": 12,
    "output_tokens": 10
  }
}

The plugin handles all format translation automatically:

Anthropic-format tools (input_schema) are converted to OpenAI-format tools (parameters)
tool_use / tool_result messages are translated bidirectionally
Top-level system field is converted to a system message
thinking parameters are mapped to reasoning_effort
max_tokens is mapped to max_completion_tokens
Streaming uses the full Anthropic SSE protocol

Using Claude Code

You can use Claude Code with this plugin by setting:

export ANTHROPIC_AUTH_TOKEN=your-otoroshi-api-key
export ANTHROPIC_API_KEY=""
export ANTHROPIC_BASE_URL=https://api.domain.tld
claude --model gpt-4o

POST /embeddings

OpenAI Embeddings API endpoint.

curl https://api.domain.tld/v1/embeddings \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $OTOROSHI_BEARER" \
  -d '{
    "model": "text-embedding-3-small",
    "input": "The quick brown fox jumps over the lazy dog"
  }'

POST /images/generations

OpenAI Image Generation API endpoint.

curl https://api.domain.tld/v1/images/generations \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $OTOROSHI_BEARER" \
  -d '{
    "model": "dall-e-3",
    "prompt": "A white cat sitting on a windowsill",
    "n": 1,
    "size": "1024x1024"
  }'

When decode_images is enabled, base64-encoded image results are decoded and returned as binary image data.

POST /images/edits

OpenAI Image Edit API endpoint. Accepts multipart/form-data uploads.

POST /audio/speech

OpenAI Text-to-Speech API endpoint.

curl https://api.domain.tld/v1/audio/speech \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $OTOROSHI_BEARER" \
  -d '{
    "model": "tts-1",
    "input": "Hello, how are you?",
    "voice": "alloy"
  }' --output speech.mp3

POST /audio/transcriptions

OpenAI Speech-to-Text API endpoint. Accepts multipart/form-data audio file uploads up to the configured max_size_upload limit.

POST /audio/translations

OpenAI Audio Translation API endpoint. Translates audio into English text.

POST /moderations

OpenAI Moderation API endpoint.

curl https://api.domain.tld/v1/moderations \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $OTOROSHI_BEARER" \
  -d '{
    "input": "Some text to check for moderation"
  }'

POST /ocr

OCR endpoint that extracts text from images and PDF documents. Backed by the OCR model entities referenced in ocr_model_refs. Accepts either a JSON body (Mistral-style document) or a multipart/form-data file upload. See the OCR Models documentation for the full request and response formats.

curl https://api.domain.tld/v1/ocr \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $OTOROSHI_BEARER" \
  -d '{
    "model": "alpha-digit-max",
    "document": {
      "type": "document_url",
      "document_url": "https://example.com/scan.pdf"
    }
  }'

Response:

{
  "model": "alpha-digit-max",
  "text": "The full extracted text...",
  "pages": [
    { "index": 0, "markdown": "The full extracted text..." }
  ],
  "usage_info": { "pages_processed": 1 }
}

Securing the API

Since this plugin exposes a powerful multi-endpoint AI API, you will want to secure it properly. Otoroshi provides several authentication mechanisms that can be combined on the same route. The key idea is that all security plugins run before the backend call, so you can stack multiple authentication methods and let the first valid one through.

Below are three common approaches, from simplest to most enterprise-ready. They can be used individually or combined on the same route.

API Keys

The simplest approach. Otoroshi API keys support multiple extraction methods (header, query param, Basic auth, Bearer token, JWT) and come with built-in quotas and rate limiting.

Add the ApikeyCalls plugin to your route:

{
  "enabled": true,
  "plugin": "cp:otoroshi.next.plugins.ApikeyCalls",
  "config": {
    "extractors": {
      "basic": { "enabled": true },
      "custom_headers": { "enabled": true },
      "client_id": { "enabled": true },
      "jwt": { "enabled": true, "secret_signed": true, "keypair_signed": true },
      "oto_bearer": { "enabled": true }
    },
    "validate": true,
    "mandatory": true,
    "wipe_backend_request": true,
    "update_quotas": true
  }
}

Clients can then authenticate with a Bearer token:

curl https://api.domain.tld/v1/chat/completions \
  -H "Authorization: Bearer your-otoroshi-apikey" \
  -H "Content-Type: application/json" \
  -d '{"model": "gpt-4o", "messages": [{"role": "user", "content": "Hello!"}]}'

Or with Basic auth, client ID/secret headers, or JWT tokens signed with the API key secret.

You can further restrict access using mandatory tags to ensure only API keys with specific tags can access this route:

{
  "enabled": true,
  "plugin": "cp:otoroshi.next.plugins.NgApikeyMandatoryTags",
  "config": {
    "tags": ["endpoint_ai-api"]
  }
}

Biscuit tokens

Biscuit tokens provide fine-grained, decentralized authorization with attenuation capabilities. This is ideal for distributing scoped tokens to different teams or applications.

Add the BiscuitUserExtractor plugin:

{
  "enabled": true,
  "plugin": "cp:otoroshi_plugins.com.cloud.apim.otoroshi.extensions.biscuit.plugins.BiscuitUserExtractor",
  "config": {
    "keypair_ref": "biscuit-keypair_your-keypair-id",
    "enforce": true,
    "extractor_type": "header",
    "extractor_name": "Authorization",
    "validations": {
      "policies": [
        "allow if endpoint(\"ai-api\")"
      ]
    }
  }
}

Clients send their biscuit token in the Authorization header:

curl https://api.domain.tld/v1/chat/completions \
  -H "Authorization: biscuit_token_here" \
  -H "Content-Type: application/json" \
  -d '{"model": "gpt-4o", "messages": [{"role": "user", "content": "Hello!"}]}'

The biscuit policy allow if endpoint("ai-api") ensures the token was explicitly granted access to this endpoint. You can create attenuated tokens with additional restrictions (time limits, IP ranges, specific models, etc.).

OIDC / JWT (Keycloak, Auth0, etc.)

For enterprise environments with an existing identity provider, you can validate JWT tokens issued by any OIDC-compliant provider (Keycloak, Auth0, Okta, Azure AD, etc.).

Add the OIDCJwtVerifier plugin:

{
  "enabled": true,
  "plugin": "cp:otoroshi.next.plugins.OIDCJwtVerifier",
  "config": {
    "ref": "auth_mod_your-oidc-verifier-id",
    "mandatory": true,
    "user": true
  }
}

The ref points to an Otoroshi JWT verifier entity configured with your OIDC provider's JWKS URL and issuer. Clients send their JWT token as a Bearer token:

# First, get a token from your OIDC provider
TOKEN=$(curl -s -X POST https://keycloak.your-domain.com/realms/your-realm/protocol/openid-connect/token \
  -d "grant_type=client_credentials" \
  -d "client_id=ai-api-client" \
  -d "client_secret=your-client-secret" | jq -r '.access_token')

# Then call the AI API
curl https://api.domain.tld/v1/chat/completions \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"model": "gpt-4o", "messages": [{"role": "user", "content": "Hello!"}]}'

Combining multiple authentication methods

The real power of Otoroshi is the ability to combine these methods on the same route. By setting mandatory: false on each auth plugin and adding the NgExpectedConsumer plugin, you can accept any of the configured authentication methods:

{
  "plugins": [
    {
      "enabled": true,
      "plugin": "cp:otoroshi_plugins.com.cloud.apim.otoroshi.extensions.aigateway.plugins.OpenAiCompatApi",
      "config": {
        "language_model_refs": ["provider_openai_1"],
        "embedding_model_refs": ["embedding_model_1"]
      }
    },
    {
      "enabled": true,
      "plugin": "cp:otoroshi.next.plugins.ApikeyCalls",
      "config": {
        "extractors": {
          "basic": { "enabled": true },
          "custom_headers": { "enabled": true },
          "jwt": { "enabled": true, "secret_signed": true, "keypair_signed": true },
          "oto_bearer": { "enabled": true }
        },
        "validate": true,
        "mandatory": false,
        "wipe_backend_request": true,
        "update_quotas": true
      }
    },
    {
      "enabled": true,
      "plugin": "cp:otoroshi_plugins.com.cloud.apim.otoroshi.extensions.biscuit.plugins.BiscuitUserExtractor",
      "config": {
        "keypair_ref": "biscuit-keypair_your-keypair-id",
        "enforce": false,
        "extractor_type": "header",
        "extractor_name": "Authorization",
        "validations": {
          "policies": ["allow if endpoint(\"ai-api\")"]
        }
      }
    },
    {
      "enabled": true,
      "plugin": "cp:otoroshi.next.plugins.OIDCJwtVerifier",
      "config": {
        "ref": "auth_mod_your-oidc-verifier-id",
        "mandatory": false,
        "user": true
      }
    },
    {
      "enabled": true,
      "plugin": "cp:otoroshi.next.plugins.NgExpectedConsumer",
      "config": {}
    }
  ]
}

With this setup:

A developer can use a simple API key for quick prototyping
An automated pipeline can use a scoped biscuit token with fine-grained permissions
A web application can use a JWT token from your corporate Keycloak/Auth0
The NgExpectedConsumer plugin ensures that at least one authentication method succeeded

All three methods work on the same route, on the same API surface, without any change to the AI plugin configuration.

Comparison with individual plugins

Feature	Individual plugins	OpenAI Compatible API
Setup complexity	One plugin per endpoint	Single plugin for everything
Path-based routing	Manual with `includes` filters	Automatic based on URL path
Anthropic Messages support	Separate `AnthropicCompatProxy` plugin	Built-in via `/messages`
Responses API support	Separate `OpenAiResponsesProxy` plugin	Built-in via `/responses`
Open Responses support	Separate `OpenResponseCompatProxy` plugin	Built-in via `/open-responses`
Prompt contexts	Not available	Built-in via `/contexts`
Model-specific config	Same refs for all endpoints	Separate refs per capability (language, audio, image, embedding, moderation)
Flexibility	Fine-grained control per endpoint	Unified but less granular

The individual plugins are still useful when you need fine-grained control per endpoint (e.g., different provider refs for different paths, or different plugin chains per endpoint). The unified plugin is ideal when you want a quick, coherent API surface with minimal configuration.

Why use this plugin?​

Supported endpoints​

Plugin configuration​

Route configuration example​

Endpoint details​

GET /models​

GET /contexts​

GET /providers​

Filtering by capability​

GET /model-capabilities​

POST /chat/completions​

Model routing​

Streaming​

POST /responses​

Input format​

Streaming​

Using Open Responses instead​

POST /messages​

Using Claude Code​

POST /embeddings​

POST /images/generations​

POST /images/edits​

POST /audio/speech​

POST /audio/transcriptions​

POST /audio/translations​

POST /moderations​

POST /ocr​

Securing the API​

API Keys​

Biscuit tokens​

OIDC / JWT (Keycloak, Auth0, etc.)​

Combining multiple authentication methods​

Comparison with individual plugins​

Why use this plugin?

Supported endpoints

Plugin configuration

Route configuration example

Endpoint details

GET /models

GET /contexts

GET /providers

Filtering by capability

GET /model-capabilities

POST /chat/completions

Model routing

Streaming

POST /responses

Input format

Streaming

Using Open Responses instead

POST /messages

Using Claude Code

POST /embeddings

POST /images/generations

POST /images/edits

POST /audio/speech

POST /audio/transcriptions

POST /audio/translations

POST /moderations

POST /ocr

Securing the API

API Keys

Biscuit tokens

OIDC / JWT (Keycloak, Auth0, etc.)

Combining multiple authentication methods

Comparison with individual plugins