Skip to main content

Setup a new LLM Provider

Quick start

Open your Otoroshi admin web UI on otoroshi.oto.tools:8080

In the categories section click on AI - LLM and then select LLM Providers

Click on Add item to create a new LLM provider.

Then, choose the provider you would like to use among the supported providers.

After choosing a provider, you can set up your secret API token and other settings if needed.

You can also choose to have a default context for this provider, or some possible contexts that will be usable by users (directly from the input payload). You can also provide restrictions on what models are usable or not with this provider.

Now you can click on Create provider to save your changes and use it into an Otoroshi Route.

Create a new route

Demo

Provider entity structure

The provider entity (JSON) has the following top-level fields:

{
"id": "provider_xxx",
"name": "My Provider",
"description": "A description",
"tags": ["production"],
"metadata": {},
"provider": "openai",
"connection": { },
"options": { },
"provider_fallback": null,
"context": { },
"models": { },
"cache": { },
"guardrails": [],
"guardrails_fail_on_deny": false,
"memory": null
}

Connection

The connection object contains the settings to connect to the LLM provider API.

Common connection fields

These fields are shared across most providers:

FieldTypeDefaultDescription
base_urlstringprovider-specificThe base URL of the provider API
tokenstring"xxx"The API token or key. Supports comma-separated values for token round-robin
timeoutnumber180000Request timeout in milliseconds (default: 3 minutes)

Token round-robin

You can distribute requests across multiple API tokens by providing a comma-separated list:

{
"connection": {
"token": "sk-token1,sk-token2,sk-token3"
}
}

Each request will use the next token in a round-robin fashion, helping to distribute rate limits across multiple accounts.

Provider-specific connection fields

Azure OpenAI

FieldTypeDescription
resource_namestringAzure resource name
deployment_idstringDeployment ID
api_versionstringAPI version (default: "2024-02-01")
api_keystringAPI key (alternative to bearer token)
{
"connection": {
"resource_name": "my-azure-resource",
"deployment_id": "my-deployment",
"api_version": "2024-02-01",
"api_key": "xxx"
}
}

Cloudflare

FieldTypeDescription
account_idstringCloudflare account ID
model_namestringModel name
{
"connection": {
"account_id": "your-account-id",
"model_name": "@cf/meta/llama-2-7b-chat-int8",
"token": "xxx"
}
}

OVH AI Endpoints

FieldTypeDescription
base_domainstringOVH base domain
unifiedbooleanUse unified API mode (default: true)

OpenAI Compatible

FieldTypeDescription
supports_toolsbooleanWhether the API supports tool calling (default: true)
supports_streamingbooleanWhether the API supports streaming (default: true)
supports_completionbooleanWhether the API supports text completions (default: true)
models_pathstringPath to the models endpoint (default: "/models")
param_mappingsobjectMap of parameter name overrides
headersobjectCustom headers (default: {"Authorization": "Bearer {api_key}"})
additional_body_paramsobjectExtra parameters to include in every request body
acc_stream_consumptionsbooleanAccumulate token usage across streaming chunks (default: false)
{
"connection": {
"base_url": "https://my-api.example.com/v1",
"token": "xxx",
"supports_tools": false,
"supports_streaming": true,
"headers": {
"Authorization": "Bearer {api_key}",
"X-Custom-Header": "value"
}
}
}

Options

The options object configures the model and its parameters. These options follow the OpenAI chat completion API parameters for most providers.

FieldTypeDefaultDescription
modelstringprovider-specificThe model to use (e.g., "gpt-4o", "claude-sonnet-4-20250514")
temperaturefloatSampling temperature (0.0 to 2.0). Lower values are more deterministic
top_pfloatNucleus sampling parameter (0.0 to 1.0)
max_tokensintegerMaximum number of tokens in the response
ninteger1Number of completions to generate
seedintegerSeed for deterministic generation
frequency_penaltyfloatPenalize tokens based on frequency (-2.0 to 2.0)
presence_penaltyfloatPenalize tokens based on presence (-2.0 to 2.0)
stopstringStop sequence
response_formatstringResponse format (e.g., "json_object")
logprobsbooleanReturn log probabilities
top_logprobsintegerNumber of top log probabilities to return
allow_config_overridebooleantrueAllow users to override these options from the request body
wasm_tools / tool_functionsarray[]List of Wasm function IDs for function calling
mcp_connectorsarray[]List of MCP connector IDs
mcp_include_functionsarray[]Whitelist of MCP function names to expose
mcp_exclude_functionsarray[]Blacklist of MCP function names to hide
max_function_callsinteger10Maximum number of function call iterations

Config override

When allow_config_override is true (the default), users can override options like model, temperature, max_tokens, etc. directly from the request body. This allows a single provider entity to serve different use cases.

When set to false, the options defined on the provider entity are always used, regardless of what the user sends in the request body.

Context

The context object lets you define system prompts that will be prepended to every request.

FieldTypeDefaultDescription
defaultstringDefault system prompt applied to all requests
contextsarray[]List of named context strings that users can select from
{
"context": {
"default": "You are a helpful assistant specialized in customer support.",
"contexts": [
"You are a technical expert.",
"You are a creative writer."
]
}
}

When contexts are defined, users can select a context by including a context_id (index) in their request.

Model restrictions

The models object restricts which models can be used with this provider, using regex patterns.

FieldTypeDefaultDescription
includearray[]Regex patterns for allowed models. If empty, all models are allowed
excludearray[]Regex patterns for denied models
{
"models": {
"include": ["gpt-4.*", "gpt-3.5-turbo"],
"exclude": [".*preview.*"]
}
}

A model is allowed if it matches at least one include pattern (or include is empty) AND does not match any exclude pattern.

Per API key / per user restrictions

In addition to the provider-level restrictions, model access can be further restricted per API key or per user through their metadata. This allows fine-grained control: for example, a single provider exposing multiple models can restrict each consumer to only the models they are entitled to use.

The following metadata keys are supported on both API keys and users (Otoroshi private app users):

Metadata keyDescription
ai_models_includeComma-separated list of regex patterns for allowed models
ai_models_excludeComma-separated list of regex patterns for denied models

For example, to restrict an API key to only GPT-4 models:

{
"metadata": {
"ai_models_include": "gpt-4.*"
}
}

To restrict a user to GPT-4 while excluding preview models:

{
"metadata": {
"ai_models_include": "gpt-4.*",
"ai_models_exclude": ".*preview.*"
}
}

All three levels of restrictions are combined: a model is allowed only if it passes the provider restrictions, AND the API key restrictions (if any), AND the user restrictions (if any). If a model is denied at any level, the request is rejected with {"error": "you can't use this model"}.

This mechanism applies to all model types: LLM providers, embedding models, audio models, image models, video models, and moderation models.

Cache

The cache object configures response caching to reduce costs and latency.

FieldTypeDefaultDescription
strategystring"none"Cache strategy: "none", "simple", or "semantic"
ttlnumber86400000Cache TTL in milliseconds (default: 24 hours)
scorenumber0.8Minimum similarity score for semantic cache hits (0.0 to 1.0)
{
"cache": {
"strategy": "simple",
"ttl": 3600000,
"score": 0.8
}
}
  • none: No caching
  • simple: Exact match caching — identical prompts return cached responses
  • semantic: Similarity-based caching — semantically similar prompts can return cached responses. Uses the score threshold. See Semantic Cache for details.

Guardrails

The guardrails array configures content validation rules that are applied before and/or after LLM calls. See the Guardrails documentation for all available guardrails.

{
"guardrails": [
{
"enabled": true,
"before": true,
"after": false,
"id": "regex",
"config": {
"allow": [],
"deny": ["credit.card.\\d+"]
}
}
],
"guardrails_fail_on_deny": false
}
FieldTypeDefaultDescription
guardrailsarray[]List of guardrail items
guardrails_fail_on_denybooleanfalseWhen true, denied requests return an error. When false, the guardrail denial message is returned as the assistant response

Each guardrail item has:

FieldTypeDescription
enabledbooleanWhether this guardrail is active
beforebooleanApply before the LLM call (on user input)
afterbooleanApply after the LLM call (on model output)
idstringGuardrail type identifier
configobjectGuardrail-specific configuration

Fallback

The provider_fallback field references another provider entity ID to use as a fallback when this provider fails.

{
"provider_fallback": "provider_backup_xxx"
}

See the Fallback documentation for details.

Memory

The memory field references an LLM Memory entity ID to enable conversation memory for this provider.

{
"memory": "memory_entity_id"
}

When configured, the provider will maintain conversation history across requests using the referenced memory store.

Special metadata

The provider metadata object supports special keys that affect behavior:

KeyDescription
endpoint_nameOverride the provider name used in API responses
costs-tracking-providerOverride the provider used for cost tracking computation
costs-tracking-modelOverride the model used for cost tracking computation
eco-impacts-providerOverride the provider used for ecological impact computation
eco-impacts-modelOverride the model used for ecological impact computation
eco-impacts-electricity-mix-zoneOverride the electricity mix zone for this provider

These metadata overrides are useful when using OpenAI-compatible providers that host models from other providers (e.g., a Mistral model hosted on Scaleway).

Full example

{
"id": "provider_openai_prod",
"name": "OpenAI Production",
"description": "Production OpenAI provider with guardrails and caching",
"tags": ["production", "openai"],
"metadata": {
"costs-tracking-provider": "openai",
"costs-tracking-model": "gpt-4o"
},
"provider": "openai",
"connection": {
"base_url": "https://api.openai.com/v1",
"token": "sk-xxx",
"timeout": 180000
},
"options": {
"model": "gpt-4o",
"temperature": 0.7,
"max_tokens": 4096,
"allow_config_override": true
},
"provider_fallback": "provider_mistral_backup",
"context": {
"default": "You are a helpful assistant."
},
"models": {
"include": ["gpt-4o.*"],
"exclude": []
},
"cache": {
"strategy": "simple",
"ttl": 3600000
},
"guardrails": [
{
"enabled": true,
"before": true,
"after": false,
"id": "regex",
"config": {
"deny": ["\\b\\d{16}\\b"]
}
}
],
"guardrails_fail_on_deny": true,
"memory": null
}