Skip to main content

OCR API

OCR is exposed as an HTTP API through the Cloud APIM - OCR backend plugin. The same handler is also available through the unified OpenAI Compatible API plugin on the POST /ocr path.

Plugin setup

Add the Cloud APIM - OCR backend plugin to your route:

{
"enabled": true,
"plugin": "cp:otoroshi_plugins.com.cloud.apim.otoroshi.extensions.aigateway.plugins.OpenAICompatOcr",
"config": {
"refs": ["ocr-model_xxxxxxxxx"],
"max_size_upload": 104857600
}
}
ParameterTypeDefaultDescription
refsarray of stringsReferences to OCR model entities
max_size_uploadnumber104857600 (100 MB)Maximum upload file size in bytes (multipart requests)

When several OCR models are referenced, the first one is used by default. To target a specific model entity, set the provider field to its id in the request body.

Request

The endpoint accepts the document in two transports.

JSON body (Mistral-style)

curl https://my-ocr-endpoint.example.com/ocr \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $OTOROSHI_API_KEY" \
-d '{
"model": "alpha-digit-max",
"document": {
"type": "document_url",
"document_url": "https://example.com/scan.pdf"
}
}'

The document can be provided in any of the following ways:

FormExample
Remote URL (Mistral-style object){ "document": { "type": "document_url", "document_url": "https://..." } }
Image URL (Mistral-style object){ "document": { "type": "image_url", "image_url": { "url": "https://..." } } }
Flat URL field{ "document_url": "https://..." } or { "image_url": "https://..." }
Base64 string{ "image_base64": "JVBERi0..." } or { "document_base64": "..." }
Base64 data-uri{ "document_url": "data:application/pdf;base64,JVBERi0..." }

Request parameters (JSON)

ParameterTypeDescription
modelstringThe OCR model to use. Defaults to the model configured on the entity.
documentobjectThe document reference (type + document_url / image_url)
document_url / image_urlstringA remote url or base64 data-uri (alternative to document)
image_base64 / document_base64stringThe document content, base64 encoded
content_typestringThe document content type (e.g. application/pdf, image/png) — useful to route image vs document with Mistral
pagesarray of numbersOptional list of page indices to process (provider dependent)
pdf_passwordstringOptional password for protected PDFs (AlphaEdge)
providerstringOptional OCR model entity id to target when several refs are configured

Multipart file upload

The dedicated plugin also accepts a raw multipart/form-data file upload. Send the file in a field named image (file and document are also accepted):

curl https://my-ocr-endpoint.example.com/ocr \
-H "Authorization: Bearer $OTOROSHI_API_KEY" \
-F "image=@scan.pdf" \
-F "model=alpha-digit-max"

Any extra form fields (e.g. model, pdf_password, provider) are read alongside the file.

Response

The response follows a simplified, Mistral-inspired shape:

{
"model": "alpha-digit-max",
"text": "The full extracted text...",
"pages": [
{ "index": 0, "markdown": "The full extracted text..." }
],
"usage_info": {
"pages_processed": 1
}
}
FieldTypeDescription
modelstringThe model that produced the result
textstringThe full extracted text (all pages concatenated)
pagesarrayPer-page results, each with index and markdown
usage_info.pages_processednumberNumber of pages processed

Unified API

When using the unified OpenAI Compatible API plugin, add your OCR model entities to ocr_model_refs and call the POST /ocr path. The request and response formats are identical to the dedicated plugin.

{
"plugin": "cp:otoroshi_plugins.com.cloud.apim.otoroshi.extensions.aigateway.plugins.OpenAiCompatApi",
"config": {
"ocr_model_refs": ["ocr-model_xxxxxxxxx"],
"max_size_upload": 104857600
}
}