Overview
The Otoroshi LLM extension provides a unified, OpenAI-compatible API for content moderation across multiple providers. Moderation models analyze text and classify it against predefined content policy categories (hate, violence, self-harm, sexual content, etc.).
Features
- OpenAI-compatible API — standard
/v1/moderationsendpoint - Multiple providers — OpenAI and Mistral supported natively
- Model routing — route to different providers using
provider/modelsyntax - Model constraints — restrict which models consumers can use via include/exclude regex patterns, enforceable per API key or per user
- Cost tracking — per-request cost tracking integrated with cost tracking
- Guardrail integration — use moderation models as guardrails on LLM providers
API endpoint
| Endpoint | Method | Description |
|---|---|---|
/v1/moderations | POST | Moderate text content against policy categories |
Request
curl --request POST \
--url http://myroute.oto.tools:8080/v1/moderations \
--header 'content-type: application/json' \
--data '{
"input": "Some text to moderate",
"model": "omni-moderation-latest"
}'
Request parameters
| Parameter | Type | Description |
|---|---|---|
input | string | The text to moderate |
model | string | Model name. Can include a provider prefix for model routing |
Response
{
"results": [
{
"flagged": false,
"categories": {
"hate": false,
"hate/threatening": false,
"self-harm": false,
"sexual": false,
"sexual/minors": false,
"violence": false,
"violence/graphic": false
},
"category_scores": {
"hate": 0.0001,
"hate/threatening": 0.00001,
"self-harm": 0.00002,
"sexual": 0.00003,
"sexual/minors": 0.000001,
"violence": 0.00005,
"violence/graphic": 0.00001
}
}
],
"model": "omni-moderation-latest",
"usage": {
"total_tokens": 5,
"input_tokens": 5,
"output_tokens": 0
}
}
Response fields
| Field | Type | Description |
|---|---|---|
results | array | Array of moderation results |
results[].flagged | boolean | Whether the content was flagged by any category |
results[].categories | object | Boolean flags for each category |
results[].category_scores | object | Confidence scores for each category (0.0 to 1.0) |
model | string | The model used for moderation |
usage | object | Token usage information |
Use cases
- Content filtering — automatically flag inappropriate content before it reaches users
- Guardrails — use moderation as an input/output guardrail on LLM providers to validate prompts and responses
- Compliance — enforce content policies across all LLM interactions
- Workflows — integrate moderation calls in workflows for custom processing pipelines