Overview

The Otoroshi LLM extension provides a unified, OpenAI-compatible API for content moderation across multiple providers. Moderation models analyze text and classify it against predefined content policy categories (hate, violence, self-harm, sexual content, etc.).

Features

OpenAI-compatible API — standard /v1/moderations endpoint
Multiple providers — OpenAI, Mistral, OVH AI Endpoints, and any OpenAI-compatible endpoint supported natively
Model routing — route to different providers using provider/model syntax
Model constraints — restrict which models consumers can use via include/exclude regex patterns, enforceable per API key or per user
Cost tracking — per-request cost tracking integrated with cost tracking
Guardrail integration — use moderation models as guardrails on LLM providers

API endpoint

Endpoint	Method	Description
`/v1/moderations`	POST	Moderate text content against policy categories

Request

curl --request POST \
  --url http://myroute.oto.tools:8080/v1/moderations \
  --header 'content-type: application/json' \
  --data '{
  "input": "Some text to moderate",
  "model": "omni-moderation-latest"
}'

Request parameters

Parameter	Type	Description
`input`	string	The text to moderate
`model`	string	Model name. Can include a provider prefix for model routing

Response

{
  "results": [
    {
      "flagged": false,
      "categories": {
        "hate": false,
        "hate/threatening": false,
        "self-harm": false,
        "sexual": false,
        "sexual/minors": false,
        "violence": false,
        "violence/graphic": false
      },
      "category_scores": {
        "hate": 0.0001,
        "hate/threatening": 0.00001,
        "self-harm": 0.00002,
        "sexual": 0.00003,
        "sexual/minors": 0.000001,
        "violence": 0.00005,
        "violence/graphic": 0.00001
      }
    }
  ],
  "model": "omni-moderation-latest",
  "usage": {
    "total_tokens": 5,
    "input_tokens": 5,
    "output_tokens": 0
  }
}

Response fields

Field	Type	Description
`results`	array	Array of moderation results
`results[].flagged`	boolean	Whether the content was flagged by any category
`results[].categories`	object	Boolean flags for each category
`results[].category_scores`	object	Confidence scores for each category (0.0 to 1.0)
`model`	string	The model used for moderation
`usage`	object	Token usage information

Use cases

Content filtering — automatically flag inappropriate content before it reaches users
Guardrails — use moderation as an input/output guardrail on LLM providers to validate prompts and responses
Compliance — enforce content policies across all LLM interactions
Workflows — integrate moderation calls in workflows for custom processing pipelines

Features​

API endpoint​

Request​

Request parameters​

Response​

Response fields​

Use cases​