📄️ Overview
🚧 Enforcing Usage Limits
📄️ Auto Secrets Leakage
This guardrail is a security measure that prevents the LLM from exposing sensitive information such as passwords, API keys, or confidential credentials, reducing the risk of data leaks. Unlike the Secrets Leakage guardrail which requires you to select specific secret categories, this guardrail uses a comprehensive hardcoded prompt to automatically detect all types of IT secrets.
📄️ Characters count validation
The Characters count guardrail validates that the message content falls within a specified character count range. This is useful to reject messages that are too short (potentially meaningless) or too long (potentially abusive or costly).
📄️ Prompt contains guardrail
The Text contains guardrail checks whether the message content contains (or does not contain) specific text values. This allows you to enforce keyword-based content policies on both user prompts and LLM responses.
📄️ Faithfulness (Hallucination detection)
The Faithfulness guardrail detects hallucinations by evaluating whether LLM responses are faithful to a provided reference context. It is particularly useful in RAG (Retrieval-Augmented Generation) pipelines where the LLM should only answer based on retrieved documents.
📄️ Prompt contains gender bias guardrail
A mechanism that identifies and reduces biased language related to gender in user prompts, promoting fairness and inclusivity in AI-generated content. It uses a dedicated LLM provider with a hardcoded prompt to detect gender bias.
📄️ Prompt contains gibberish guardrail
This guardrail acts like a filter that detects and manages inputs that are nonsensical, random, or meaningless, preventing the AI from generating irrelevant or low-quality responses. It uses a dedicated LLM provider with a hardcoded prompt to detect gibberish content.
📄️ LLM guardrails
The LLM guardrail uses a separate LLM provider with a custom prompt to validate messages. This is the most flexible guardrail, as you can define any validation logic through natural language instructions in a prompt template.
📄️ Language moderation
The Language moderation guardrail uses a dedicated LLM provider with a hardcoded prompt to detect content that falls into standard moderation categories. You select which categories to enforce from a predefined list.
📄️ Moderation Model
The Moderation Model guardrail delegates content moderation to a dedicated moderation model (such as OpenAI's moderation endpoint) rather than using an LLM prompt.
📄️ Personal Health information
It's a safeguard that ensures the LLM does not process, store, or share any personal health-related details, protecting user privacy and compliance with regulations.
📄️ Personal information
Configuration
📄️ Prompt injection
The Prompt injection guardrail detects and blocks prompt injection and jailbreak attempts. It uses a dedicated LLM provider to analyze user input and score it for potential injection attacks.
📄️ QuickJS
The QuickJS guardrail lets you write custom validation logic in JavaScript. The script is executed locally within Otoroshi using the QuickJS engine (via WASM), providing a lightweight way to implement custom guardrails without compiling a full WASM plugin.
📄️ Racial Bias
The Racial bias guardrail detects and blocks messages containing racial bias. It uses a dedicated LLM provider with a hardcoded prompt to identify stereotyping, discriminatory language, microaggressions, and other forms of racial prejudice.
📄️ Regex
The Regex guardrail validates message content against regular expression patterns. You can define allow-list and deny-list patterns to control what content is permitted or blocked.
📄️ Secrets Leakage
In the context of LLMs (Large Language Models) and AI, a "Secrets Leakage Guardrail" refers to a security mechanism designed to prevent AI models from exposing sensitive or confidential information. This can include API keys, passwords, proprietary business data, or personally identifiable information (PII).
📄️ Semantic contains
The Semantic contains guardrail checks whether the message content is semantically similar to specified values, using embedding-based similarity rather than exact text matching. This is more flexible than the Text contains guardrail because it can detect meaning even when the exact words differ.
📄️ Sentences Count
The Sentences count guardrail validates that the message content falls within a specified sentence count range. This is useful to ensure that prompts or responses contain a minimum amount of structured content, or to prevent overly verbose outputs.
📄️ Toxic Language
The Toxic language guardrail detects and blocks messages containing toxic or harmful language. It uses a dedicated LLM provider with a hardcoded prompt to identify hate speech, insults, threats, harassment, and other forms of toxic content.
📄️ WASM
The WASM guardrail lets you write custom validation logic in any language that compiles to WebAssembly (Rust, Go, AssemblyScript, etc.). The WASM plugin is executed locally within Otoroshi, providing high performance without external network calls.
📄️ Webhook
The Webhook guardrail delegates content validation to an external HTTP service. This allows you to implement custom validation logic in any language or framework by exposing a simple HTTP endpoint.
📄️ Words Count
The Words count guardrail validates that the message content falls within a specified word count range. This is useful to enforce minimum prompt lengths or prevent excessively long inputs.
📄️ Workflow
The Workflow guardrail lets you use an Otoroshi workflow to validate messages. This is similar to the WASM and QuickJS guardrails, but instead of writing code in a specific language, you define the validation logic as an Otoroshi workflow — a visual, composable pipeline of steps.