OCR Models
Otoroshi LLM Extension provides support for OCR (Optical Character Recognition) models, enabling text extraction from images and PDF documents through a unified, Mistral-inspired API.
OCR is exposed as a dedicated, first-class entity type — the OCR Model — just like Audio Models, Image Models, or Embedding Models. Each OCR Model wraps a provider and its configuration, and can be called through a dedicated plugin, the unified OpenAI-compatible API, or a workflow function.
Supported providers
| Provider | Models | Input |
|---|---|---|
| AlphaEdge 🇫🇷 🇪🇺 | alpha-digit-max, alpha-digit-medium | image, pdf |
| Mistral | mistral-ocr-latest, mistral-ocr-2505 | image, pdf |
Features
- Unified API: All providers are exposed through a single, consistent OCR endpoint regardless of the underlying provider
- Multiple input formats: Pass the document as a remote URL, a base64 data-uri, an inline base64 string, or a raw multipart file upload
- Two transports: JSON body (Mistral-style) or
multipart/form-datafile upload - Vault integration: API tokens support Otoroshi vault references (e.g.
${vault://local/my-token}) - Model constraints: Restrict which models can be used via allow/block lists
- Workflow integration: OCR is available as a workflow function (
ocr_call) for use in agentic pipelines
How to use OCR
There are three ways to call an OCR Model:
| Method | Description |
|---|---|
| Dedicated plugin | The Cloud APIM - OCR backend plugin exposes a single POST /ocr route. See Plugins. |
| Unified API | The OpenAI Compatible API plugin exposes POST /ocr alongside chat, audio, image and embedding endpoints. See OpenAI Compatible API. |
| Workflow function | The ocr_call function calls an OCR Model from within a workflow. |
OCR can also be performed through a regular text/LLM provider instead of an OCR Model entity — by sending a chat completion with an image or PDF content part. See OCR through text models.
OCR model entity
An OCR Model entity wraps a provider connection and its options:
{
"id": "ocr-model_xxxxxxxxx",
"name": "My OCR Model",
"description": "OCR model backed by AlphaEdge",
"provider": "alphaedge",
"config": {
"connection": {
"base_url": "https://api-endpoints.alphaedge-ai.com",
"token": "${vault://local/ALPHAEDGE_API_KEY}",
"timeout": 180000
},
"options": {
"model": "alpha-digit-max"
}
},
"kind": "ai-gateway.extensions.cloud-apim.com/OcrModel"
}
See Providers for the per-provider configuration, and Plugins for the full API usage.