Overview
The Otoroshi LLM extension provides a unified, OpenAI-compatible API for computing text embeddings across multiple providers. Embeddings are vector representations of text that capture semantic meaning, enabling similarity search, clustering, and RAG (Retrieval-Augmented Generation) pipelines.
Features
- 16+ embedding providers including cloud APIs and a local ONNX model
- OpenAI-compatible API — standard
/v1/embeddingsendpoint - Batch embedding — embed multiple texts in a single request
- Encoding formats —
float(JSON array) orbase64(compact binary) - Model routing — route to different providers using
provider/modelsyntax - Model constraints — restrict which models consumers can use via include/exclude regex patterns
- Budget enforcement — embedding costs are tracked and budgets are enforced
- Cost tracking — per-request cost tracking integrated with cost tracking
- Embedding stores — local in-memory vector stores for similarity search
- Token round-robin — distribute load across multiple API tokens
API endpoint
| Endpoint | Method | Description |
|---|---|---|
/v1/embeddings | POST | Compute embeddings for one or more text inputs |
Request
curl --request POST \
--url http://myroute.oto.tools:8080/v1/embeddings \
--header 'content-type: application/json' \
--data '{
"input": ["Hello world", "How are you?"],
"model": "text-embedding-3-small",
"encoding_format": "float"
}'
Request parameters
| Parameter | Type | Description |
|---|---|---|
input | string or array | The text(s) to embed. Can be a single string or an array of strings for batch embedding |
model | string | Model name. Can include a provider prefix for model routing |
dimensions | integer | Requested embedding dimensions (supported by some models like text-embedding-3-small) |
encoding_format | string | Output format: "float" (default, JSON array of numbers) or "base64" (compact binary encoding) |
user | string | End-user identifier for tracking |
Response
{
"object": "list",
"data": [
{
"object": "embedding",
"index": 0,
"embedding": [0.0023064255, -0.009327292, ...]
},
{
"object": "embedding",
"index": 1,
"embedding": [-0.0015486241, 0.0073928963, ...]
}
],
"model": "text-embedding-3-small",
"usage": {
"prompt_tokens": 8,
"total_tokens": 8
}
}
With "encoding_format": "base64", each embedding is returned as a base64-encoded string of little-endian float bytes instead of a JSON array.
Embedding stores
The extension provides embedding stores for storing and searching embedding vectors. Multiple backends are supported, from a local in-memory store to production-grade vector databases.
Embedding stores are used internally by the semantic cache and can be used in workflows via dedicated functions:
vector_store_add— add a document with its embedding to a storevector_store_remove— remove a document by IDvector_store_search— search by embedding vector similarity
Supported providers
| Provider | Value | Description |
|---|---|---|
| Local (in-memory) | local | LangChain4j InMemoryEmbeddingStore, no external dependency |
| ChromaDB | chromadb | Open-source vector database via HTTP API |
| Elasticsearch | elasticsearch | kNN vector search on dense_vector fields |
| OpenSearch | opensearch | kNN vector search with HNSW |
| Qdrant | qdrant | Purpose-built vector database via REST API |
| Weaviate | weaviate | Vector database with GraphQL search |
| Pinecone | pinecone | Cloud-native vector database |
| Redis Stack | redis | RediSearch module with vector similarity (requires Redis Stack) |
| PostgreSQL | postgresql | pgvector extension for cosine similarity search |
Local store configuration
{
"provider": "local",
"config": {
"connection": {
"name": "my-store",
"session_id": "optional-session-id",
"init_content": "https://example.com/initial-data.json"
},
"options": {
"max_results": 3,
"min_score": 0.7
}
}
}
| Parameter | Type | Default | Description |
|---|---|---|---|
connection.name | string | — | Store name |
connection.session_id | string | — | Optional session ID for per-session isolation |
connection.init_content | string | — | URL to initial content (HTTP/HTTPS, file://, or s3://) |
options.max_results | integer | 3 | Maximum number of results returned by search |
options.min_score | number | 0.7 | Minimum cosine similarity score for search matches |
ChromaDB configuration
{
"provider": "chromadb",
"config": {
"connection": {
"url": "http://localhost:8000",
"collection": "my-collection",
"api_key": "optional-api-key",
"tenant": "default_tenant",
"database": "default_database"
}
}
}
Elasticsearch / OpenSearch configuration
{
"provider": "elasticsearch",
"config": {
"connection": {
"url": "http://localhost:9200",
"index": "embeddings",
"username": "elastic",
"password": "changeme",
"dims": 384,
"similarity": "cosine"
}
}
}
Authentication supports username/password (Basic), api_key (ApiKey header), or no auth. The index and mapping are created automatically on first use.
For OpenSearch, use "provider": "opensearch" with additional options engine (default lucene) and space_type (default cosinesimil).
Qdrant configuration
{
"provider": "qdrant",
"config": {
"connection": {
"url": "http://localhost:6333",
"collection": "my-collection",
"api_key": "optional-api-key",
"dims": 384,
"distance": "Cosine"
}
}
}
Weaviate configuration
{
"provider": "weaviate",
"config": {
"connection": {
"url": "http://localhost:8080",
"class_name": "Embedding",
"api_key": "optional-api-key"
}
}
}
Pinecone configuration
{
"provider": "pinecone",
"config": {
"connection": {
"url": "https://index-xxxxx.svc.environment.pinecone.io",
"api_key": "your-api-key",
"namespace": ""
}
}
}
Redis Stack configuration
Requires Redis with the Search module (redis-stack). Embeddings are stored as binary vectors in HASH keys and indexed with RediSearch for KNN similarity search.
{
"provider": "redis",
"config": {
"connection": {
"url": "redis://localhost:6379",
"prefix": "otoroshi:ai:emb",
"dims": 384,
"distance_metric": "COSINE"
}
}
}
| Parameter | Type | Default | Description |
|---|---|---|---|
url | string | redis://localhost:6379 | Redis connection URI |
prefix | string | otoroshi:ai:emb | Key prefix for hash keys and index |
dims | integer | 384 | Vector dimensions |
distance_metric | string | COSINE | Distance metric (COSINE, L2, IP) |
PostgreSQL (pgvector) configuration
Requires PostgreSQL with the pgvector extension. The extension and table are created automatically on first use.
{
"provider": "postgresql",
"config": {
"connection": {
"uri": "postgresql://user:password@localhost:5432/mydb",
"table": "otoroshi_ai_embeddings",
"dims": 384
}
}
}
| Parameter | Type | Default | Description |
|---|---|---|---|
uri | string | postgresql://otoroshi:otoroshi@localhost:5432/otoroshi | PostgreSQL connection URI |
table | string | otoroshi_ai_embeddings | Table name |
dims | integer | 384 | Vector dimensions |