Skip to main content

Overview

The Otoroshi LLM extension provides a unified, OpenAI-compatible API for content moderation across multiple providers. Moderation models analyze text and classify it against predefined content policy categories (hate, violence, self-harm, sexual content, etc.).

Features

  • OpenAI-compatible API — standard /v1/moderations endpoint
  • Multiple providers — OpenAI and Mistral supported natively
  • Model routing — route to different providers using provider/model syntax
  • Model constraints — restrict which models consumers can use via include/exclude regex patterns, enforceable per API key or per user
  • Cost tracking — per-request cost tracking integrated with cost tracking
  • Guardrail integration — use moderation models as guardrails on LLM providers

API endpoint

EndpointMethodDescription
/v1/moderationsPOSTModerate text content against policy categories

Request

curl --request POST \
--url http://myroute.oto.tools:8080/v1/moderations \
--header 'content-type: application/json' \
--data '{
"input": "Some text to moderate",
"model": "omni-moderation-latest"
}'

Request parameters

ParameterTypeDescription
inputstringThe text to moderate
modelstringModel name. Can include a provider prefix for model routing

Response

{
"results": [
{
"flagged": false,
"categories": {
"hate": false,
"hate/threatening": false,
"self-harm": false,
"sexual": false,
"sexual/minors": false,
"violence": false,
"violence/graphic": false
},
"category_scores": {
"hate": 0.0001,
"hate/threatening": 0.00001,
"self-harm": 0.00002,
"sexual": 0.00003,
"sexual/minors": 0.000001,
"violence": 0.00005,
"violence/graphic": 0.00001
}
}
],
"model": "omni-moderation-latest",
"usage": {
"total_tokens": 5,
"input_tokens": 5,
"output_tokens": 0
}
}

Response fields

FieldTypeDescription
resultsarrayArray of moderation results
results[].flaggedbooleanWhether the content was flagged by any category
results[].categoriesobjectBoolean flags for each category
results[].category_scoresobjectConfidence scores for each category (0.0 to 1.0)
modelstringThe model used for moderation
usageobjectToken usage information

Use cases

  • Content filtering — automatically flag inappropriate content before it reaches users
  • Guardrails — use moderation as an input/output guardrail on LLM providers to validate prompts and responses
  • Compliance — enforce content policies across all LLM interactions
  • Workflows — integrate moderation calls in workflows for custom processing pipelines