Skip to main content

OpenAI Compatible API (unified)

The LLM OpenAI Compatible API plugin is a unified backend plugin that exposes a single Otoroshi route as a full-featured, multi-endpoint AI API. Instead of configuring separate plugins for chat completions, audio, images, embeddings, moderation, and responses, this single plugin handles all of them with a consistent configuration.

cp:otoroshi_plugins.com.cloud.apim.otoroshi.extensions.aigateway.plugins.OpenAiCompatApi

Why use this plugin?

When building an AI gateway, you typically need to expose several API endpoints: chat completions, embeddings, image generation, audio transcription, moderation, etc. With the individual plugins, you would need to configure each one separately on the same route, mapping each to a specific path.

The OpenAI Compatible API plugin simplifies this by providing a single plugin that routes requests based on the URL path to the appropriate handler. It also goes beyond the standard OpenAI API by supporting additional endpoints:

  • Anthropic Messages API (/messages) for clients like Claude Code
  • OpenAI Responses API (/responses) for the newer responses format
  • Open Responses (/open-responses) for the Open Responses specification
  • Prompt contexts (/contexts) for listing available prompt contexts

This makes it the ideal choice when you want to expose a single, coherent API surface that supports multiple AI client ecosystems.

Supported endpoints

MethodPathDescriptionConfig field
GET/modelsList available models from all configured providerslanguage_model_refs
GET/contextsList available prompt contextscontext_refs
POST/chat/completionsOpenAI Chat Completions APIlanguage_model_refs
POST/responsesOpenAI Responses API (or Open Responses if configured)language_model_refs
POST/open-responsesOpen Responses API (always available)language_model_refs
POST/oai-responsesOpenAI Responses API (always available)language_model_refs
POST/messagesAnthropic Messages APIlanguage_model_refs
POST/embeddingsOpenAI Embeddings APIembedding_model_refs
POST/images/generationsOpenAI Image Generation APIimage_model_refs
POST/images/editsOpenAI Image Edit APIimage_model_refs
POST/audio/speechOpenAI Text-to-Speech APIaudio_model_refs
POST/audio/transcriptionsOpenAI Speech-to-Text APIaudio_model_refs
POST/audio/translationsOpenAI Audio Translation APIaudio_model_refs
POST/moderationsOpenAI Moderation APImoderation_model_refs

Plugin configuration

{
"plugin": "cp:otoroshi_plugins.com.cloud.apim.otoroshi.extensions.aigateway.plugins.OpenAiCompatApi",
"enabled": true,
"config": {
"language_model_refs": ["provider_openai_1", "provider_mistral_1"],
"audio_model_refs": ["audio_model_1"],
"image_model_refs": ["image_model_1"],
"embedding_model_refs": ["embedding_model_1"],
"moderation_model_refs": ["moderation_model_1"],
"context_refs": ["context_1", "context_2"],
"max_size_upload": 104857600,
"decode_images": false,
"use_open_response_for_responses": false
}
}
ParameterTypeDefaultDescription
language_model_refsarray of strings[]References to LLM provider entities used for chat completions, responses, and messages endpoints
audio_model_refsarray of strings[]References to audio model entities used for speech and transcription endpoints
image_model_refsarray of strings[]References to image model entities used for image generation and editing
embedding_model_refsarray of strings[]References to embedding model entities used for the embeddings endpoint
moderation_model_refsarray of strings[]References to moderation model entities used for the moderations endpoint
context_refsarray of strings[]References to prompt context entities returned by the /contexts endpoint
max_size_uploadnumber104857600 (100MB)Maximum file upload size in bytes for audio and image endpoints
decode_imagesbooleanfalseWhen enabled, decodes base64-encoded image results into binary responses
use_open_response_for_responsesbooleanfalseWhen enabled, the /responses endpoint uses the Open Responses proxy instead of the default OpenAI Responses proxy

Route configuration example

{
"id": "route_unified_ai_api",
"name": "Unified AI API",
"frontend": {
"domains": ["api.domain.tld/v1"],
"strip_path": true,
"exact": false
},
"backend": {
"targets": [
{
"id": "target_1",
"hostname": "request.otoroshi.io",
"port": 443,
"tls": true
}
]
},
"plugins": [
{
"enabled": true,
"plugin": "cp:otoroshi.next.plugins.OverrideHost",
"config": {}
},
{
"enabled": true,
"plugin": "cp:otoroshi_plugins.com.cloud.apim.otoroshi.extensions.aigateway.plugins.OpenAiCompatApi",
"config": {
"language_model_refs": ["provider_openai_1", "provider_mistral_1"],
"embedding_model_refs": ["embedding_model_1"],
"image_model_refs": ["image_model_1"],
"audio_model_refs": ["audio_model_1"],
"context_refs": ["context_project_a"],
"use_open_response_for_responses": false
}
}
]
}

Endpoint details

GET /models

Lists all models available from the configured language model providers. Compatible with the OpenAI Models API.

curl https://api.domain.tld/v1/models \
-H "Authorization: Bearer $OTOROSHI_BEARER"

When multiple providers are configured, model IDs are prefixed with the provider slug name (e.g., my_openai/gpt-4o). Use ?raw=true to get raw model identifiers without prefixes.

GET /contexts

Returns the list of prompt contexts configured for this plugin. Each context includes its id and name.

curl https://api.domain.tld/v1/contexts \
-H "Authorization: Bearer $OTOROSHI_BEARER"

Response:

[
{ "id": "context_project_a", "name": "Project A context" },
{ "id": "context_support", "name": "Support context" }
]

POST /chat/completions

Standard OpenAI Chat Completions API endpoint. Supports streaming, tool calling, and model routing.

curl https://api.domain.tld/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $OTOROSHI_BEARER" \
-d '{
"model": "gpt-4o",
"messages": [
{ "role": "user", "content": "Hello!" }
]
}'

Model routing

When multiple providers are configured, you can target a specific provider using the model field:

  • Slash syntax: providerName/modelName (e.g., my_openai/gpt-4o)
  • Hash syntax: providerId###modelName (e.g., provider_xxx###gpt-4o)

If no provider prefix is specified, the first configured provider is used.

Streaming

Streaming is activated when any of the following is true:

  • The request body contains "stream": true
  • The query parameter ?stream=true is present
  • The header x-stream: true is present

POST /responses

OpenAI Responses API endpoint. This is the newer OpenAI API format that uses input and instructions instead of messages.

curl https://api.domain.tld/v1/responses \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $OTOROSHI_BEARER" \
-d '{
"model": "gpt-4o",
"instructions": "You are a helpful assistant.",
"input": "What is the capital of France?"
}'

Response:

{
"id": "resp_xxxxx",
"object": "response",
"created_at": 1711569952,
"model": "gpt-4o",
"status": "completed",
"output": [
{
"type": "message",
"id": "msg_xxxxx",
"status": "completed",
"role": "assistant",
"content": [
{
"type": "output_text",
"text": "The capital of France is Paris.",
"annotations": []
}
]
}
],
"usage": {
"input_tokens": 25,
"output_tokens": 8,
"total_tokens": 33
}
}

Input format

The input field accepts multiple formats:

  • String: a simple text message treated as a user message

    { "input": "Hello!" }
  • Array of messages: standard role-based messages

    {
    "input": [
    { "type": "message", "role": "user", "content": "Hello!" }
    ]
    }
  • Multipart content: messages with mixed content types

    {
    "input": [
    {
    "type": "message",
    "role": "user",
    "content": [
    { "type": "input_text", "text": "What's in this image?" },
    { "type": "input_image", "image_url": "https://example.com/image.png" }
    ]
    }
    ]
    }
  • Function call outputs: tool result messages

    {
    "input": [
    { "type": "function_call_output", "call_id": "call_123", "output": "Paris, 15 degrees" }
    ]
    }

Streaming

When streaming is enabled, the response uses Server-Sent Events with the Responses API event protocol:

event: response.created
data: {"type":"response.created","response":{...}}

event: response.output_text.delta
data: {"type":"response.output_text.delta","item_id":"msg_xxx","output_index":0,"content_index":0,"delta":"The capital"}

event: response.output_text.delta
data: {"type":"response.output_text.delta","item_id":"msg_xxx","output_index":0,"content_index":0,"delta":" of France is Paris."}

event: response.completed
data: {"type":"response.completed","response":{...}}

The full lifecycle events are emitted: response.created, response.in_progress, response.output_item.added, response.content_part.added, response.output_text.delta, response.output_text.done, response.content_part.done, response.output_item.done, response.completed.

Using Open Responses instead

When use_open_response_for_responses is set to true, the /responses endpoint uses the Open Responses proxy implementation instead of the default one. The Open Responses proxy provides a richer implementation with native support for function calling, reasoning events, and more streaming events.

Regardless of this flag, the dedicated endpoints are always available:

  • /open-responses always uses the Open Responses proxy
  • /oai-responses always uses the default OpenAI Responses proxy

POST /messages

Anthropic Messages API endpoint. This allows any Anthropic API client (including Claude Code) to use any LLM provider managed by Otoroshi.

curl https://api.domain.tld/v1/messages \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $OTOROSHI_BEARER" \
-H "anthropic-version: 2023-06-01" \
-d '{
"model": "gpt-4o",
"max_tokens": 1024,
"messages": [
{ "role": "user", "content": "Hello!" }
]
}'

Response (Anthropic format):

{
"id": "msg_xxxxx",
"type": "message",
"role": "assistant",
"model": "gpt-4o",
"content": [
{
"type": "text",
"text": "Hello! How can I assist you today?"
}
],
"stop_reason": "end_turn",
"usage": {
"input_tokens": 12,
"output_tokens": 10
}
}

The plugin handles all format translation automatically:

  • Anthropic-format tools (input_schema) are converted to OpenAI-format tools (parameters)
  • tool_use / tool_result messages are translated bidirectionally
  • Top-level system field is converted to a system message
  • thinking parameters are mapped to reasoning_effort
  • max_tokens is mapped to max_completion_tokens
  • Streaming uses the full Anthropic SSE protocol

Using Claude Code

You can use Claude Code with this plugin by setting:

export ANTHROPIC_AUTH_TOKEN=your-otoroshi-api-key
export ANTHROPIC_API_KEY=""
export ANTHROPIC_BASE_URL=https://api.domain.tld
claude --model gpt-4o

POST /embeddings

OpenAI Embeddings API endpoint.

curl https://api.domain.tld/v1/embeddings \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $OTOROSHI_BEARER" \
-d '{
"model": "text-embedding-3-small",
"input": "The quick brown fox jumps over the lazy dog"
}'

POST /images/generations

OpenAI Image Generation API endpoint.

curl https://api.domain.tld/v1/images/generations \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $OTOROSHI_BEARER" \
-d '{
"model": "dall-e-3",
"prompt": "A white cat sitting on a windowsill",
"n": 1,
"size": "1024x1024"
}'

When decode_images is enabled, base64-encoded image results are decoded and returned as binary image data.

POST /images/edits

OpenAI Image Edit API endpoint. Accepts multipart/form-data uploads.

POST /audio/speech

OpenAI Text-to-Speech API endpoint.

curl https://api.domain.tld/v1/audio/speech \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $OTOROSHI_BEARER" \
-d '{
"model": "tts-1",
"input": "Hello, how are you?",
"voice": "alloy"
}' --output speech.mp3

POST /audio/transcriptions

OpenAI Speech-to-Text API endpoint. Accepts multipart/form-data audio file uploads up to the configured max_size_upload limit.

POST /audio/translations

OpenAI Audio Translation API endpoint. Translates audio into English text.

POST /moderations

OpenAI Moderation API endpoint.

curl https://api.domain.tld/v1/moderations \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $OTOROSHI_BEARER" \
-d '{
"input": "Some text to check for moderation"
}'

Securing the API

Since this plugin exposes a powerful multi-endpoint AI API, you will want to secure it properly. Otoroshi provides several authentication mechanisms that can be combined on the same route. The key idea is that all security plugins run before the backend call, so you can stack multiple authentication methods and let the first valid one through.

Below are three common approaches, from simplest to most enterprise-ready. They can be used individually or combined on the same route.

API Keys

The simplest approach. Otoroshi API keys support multiple extraction methods (header, query param, Basic auth, Bearer token, JWT) and come with built-in quotas and rate limiting.

Add the ApikeyCalls plugin to your route:

{
"enabled": true,
"plugin": "cp:otoroshi.next.plugins.ApikeyCalls",
"config": {
"extractors": {
"basic": { "enabled": true },
"custom_headers": { "enabled": true },
"client_id": { "enabled": true },
"jwt": { "enabled": true, "secret_signed": true, "keypair_signed": true },
"oto_bearer": { "enabled": true }
},
"validate": true,
"mandatory": true,
"wipe_backend_request": true,
"update_quotas": true
}
}

Clients can then authenticate with a Bearer token:

curl https://api.domain.tld/v1/chat/completions \
-H "Authorization: Bearer your-otoroshi-apikey" \
-H "Content-Type: application/json" \
-d '{"model": "gpt-4o", "messages": [{"role": "user", "content": "Hello!"}]}'

Or with Basic auth, client ID/secret headers, or JWT tokens signed with the API key secret.

You can further restrict access using mandatory tags to ensure only API keys with specific tags can access this route:

{
"enabled": true,
"plugin": "cp:otoroshi.next.plugins.NgApikeyMandatoryTags",
"config": {
"tags": ["endpoint_ai-api"]
}
}

Biscuit tokens

Biscuit tokens provide fine-grained, decentralized authorization with attenuation capabilities. This is ideal for distributing scoped tokens to different teams or applications.

Add the BiscuitUserExtractor plugin:

{
"enabled": true,
"plugin": "cp:otoroshi_plugins.com.cloud.apim.otoroshi.extensions.biscuit.plugins.BiscuitUserExtractor",
"config": {
"keypair_ref": "biscuit-keypair_your-keypair-id",
"enforce": true,
"extractor_type": "header",
"extractor_name": "Authorization",
"validations": {
"policies": [
"allow if endpoint(\"ai-api\")"
]
}
}
}

Clients send their biscuit token in the Authorization header:

curl https://api.domain.tld/v1/chat/completions \
-H "Authorization: biscuit_token_here" \
-H "Content-Type: application/json" \
-d '{"model": "gpt-4o", "messages": [{"role": "user", "content": "Hello!"}]}'

The biscuit policy allow if endpoint("ai-api") ensures the token was explicitly granted access to this endpoint. You can create attenuated tokens with additional restrictions (time limits, IP ranges, specific models, etc.).

OIDC / JWT (Keycloak, Auth0, etc.)

For enterprise environments with an existing identity provider, you can validate JWT tokens issued by any OIDC-compliant provider (Keycloak, Auth0, Okta, Azure AD, etc.).

Add the OIDCJwtVerifier plugin:

{
"enabled": true,
"plugin": "cp:otoroshi.next.plugins.OIDCJwtVerifier",
"config": {
"ref": "auth_mod_your-oidc-verifier-id",
"mandatory": true,
"user": true
}
}

The ref points to an Otoroshi JWT verifier entity configured with your OIDC provider's JWKS URL and issuer. Clients send their JWT token as a Bearer token:

# First, get a token from your OIDC provider
TOKEN=$(curl -s -X POST https://keycloak.your-domain.com/realms/your-realm/protocol/openid-connect/token \
-d "grant_type=client_credentials" \
-d "client_id=ai-api-client" \
-d "client_secret=your-client-secret" | jq -r '.access_token')

# Then call the AI API
curl https://api.domain.tld/v1/chat/completions \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{"model": "gpt-4o", "messages": [{"role": "user", "content": "Hello!"}]}'

Combining multiple authentication methods

The real power of Otoroshi is the ability to combine these methods on the same route. By setting mandatory: false on each auth plugin and adding the NgExpectedConsumer plugin, you can accept any of the configured authentication methods:

{
"plugins": [
{
"enabled": true,
"plugin": "cp:otoroshi_plugins.com.cloud.apim.otoroshi.extensions.aigateway.plugins.OpenAiCompatApi",
"config": {
"language_model_refs": ["provider_openai_1"],
"embedding_model_refs": ["embedding_model_1"]
}
},
{
"enabled": true,
"plugin": "cp:otoroshi.next.plugins.ApikeyCalls",
"config": {
"extractors": {
"basic": { "enabled": true },
"custom_headers": { "enabled": true },
"jwt": { "enabled": true, "secret_signed": true, "keypair_signed": true },
"oto_bearer": { "enabled": true }
},
"validate": true,
"mandatory": false,
"wipe_backend_request": true,
"update_quotas": true
}
},
{
"enabled": true,
"plugin": "cp:otoroshi_plugins.com.cloud.apim.otoroshi.extensions.biscuit.plugins.BiscuitUserExtractor",
"config": {
"keypair_ref": "biscuit-keypair_your-keypair-id",
"enforce": false,
"extractor_type": "header",
"extractor_name": "Authorization",
"validations": {
"policies": ["allow if endpoint(\"ai-api\")"]
}
}
},
{
"enabled": true,
"plugin": "cp:otoroshi.next.plugins.OIDCJwtVerifier",
"config": {
"ref": "auth_mod_your-oidc-verifier-id",
"mandatory": false,
"user": true
}
},
{
"enabled": true,
"plugin": "cp:otoroshi.next.plugins.NgExpectedConsumer",
"config": {}
}
]
}

With this setup:

  • A developer can use a simple API key for quick prototyping
  • An automated pipeline can use a scoped biscuit token with fine-grained permissions
  • A web application can use a JWT token from your corporate Keycloak/Auth0
  • The NgExpectedConsumer plugin ensures that at least one authentication method succeeded

All three methods work on the same route, on the same API surface, without any change to the AI plugin configuration.

Comparison with individual plugins

FeatureIndividual pluginsOpenAI Compatible API
Setup complexityOne plugin per endpointSingle plugin for everything
Path-based routingManual with includes filtersAutomatic based on URL path
Anthropic Messages supportSeparate AnthropicCompatProxy pluginBuilt-in via /messages
Responses API supportSeparate OpenAiResponsesProxy pluginBuilt-in via /responses
Open Responses supportSeparate OpenResponseCompatProxy pluginBuilt-in via /open-responses
Prompt contextsNot availableBuilt-in via /contexts
Model-specific configSame refs for all endpointsSeparate refs per capability (language, audio, image, embedding, moderation)
FlexibilityFine-grained control per endpointUnified but less granular

The individual plugins are still useful when you need fine-grained control per endpoint (e.g., different provider refs for different paths, or different plugin chains per endpoint). The unified plugin is ideal when you want a quick, coherent API surface with minimal configuration.