Skip to main content

Overview

The Otoroshi LLM extension provides a unified, OpenAI-compatible API for computing text embeddings across multiple providers. Embeddings are vector representations of text that capture semantic meaning, enabling similarity search, clustering, and RAG (Retrieval-Augmented Generation) pipelines.

Features

  • 16+ embedding providers including cloud APIs and a local ONNX model
  • OpenAI-compatible API — standard /v1/embeddings endpoint
  • Batch embedding — embed multiple texts in a single request
  • Encoding formatsfloat (JSON array) or base64 (compact binary)
  • Model routing — route to different providers using provider/model syntax
  • Model constraints — restrict which models consumers can use via include/exclude regex patterns
  • Budget enforcement — embedding costs are tracked and budgets are enforced
  • Cost tracking — per-request cost tracking integrated with cost tracking
  • Embedding stores — local in-memory vector stores for similarity search
  • Token round-robin — distribute load across multiple API tokens

API endpoint

EndpointMethodDescription
/v1/embeddingsPOSTCompute embeddings for one or more text inputs

Request

curl --request POST \
--url http://myroute.oto.tools:8080/v1/embeddings \
--header 'content-type: application/json' \
--data '{
"input": ["Hello world", "How are you?"],
"model": "text-embedding-3-small",
"encoding_format": "float"
}'

Request parameters

ParameterTypeDescription
inputstring or arrayThe text(s) to embed. Can be a single string or an array of strings for batch embedding
modelstringModel name. Can include a provider prefix for model routing
dimensionsintegerRequested embedding dimensions (supported by some models like text-embedding-3-small)
encoding_formatstringOutput format: "float" (default, JSON array of numbers) or "base64" (compact binary encoding)
userstringEnd-user identifier for tracking

Response

{
"object": "list",
"data": [
{
"object": "embedding",
"index": 0,
"embedding": [0.0023064255, -0.009327292, ...]
},
{
"object": "embedding",
"index": 1,
"embedding": [-0.0015486241, 0.0073928963, ...]
}
],
"model": "text-embedding-3-small",
"usage": {
"prompt_tokens": 8,
"total_tokens": 8
}
}

With "encoding_format": "base64", each embedding is returned as a base64-encoded string of little-endian float bytes instead of a JSON array.

Embedding stores

The extension provides embedding stores for storing and searching embedding vectors. Multiple backends are supported, from a local in-memory store to production-grade vector databases.

Embedding stores are used internally by the semantic cache and can be used in workflows via dedicated functions:

  • vector_store_add — add a document with its embedding to a store
  • vector_store_remove — remove a document by ID
  • vector_store_search — search by embedding vector similarity

Supported providers

ProviderValueDescription
Local (in-memory)localLangChain4j InMemoryEmbeddingStore, no external dependency
ChromaDBchromadbOpen-source vector database via HTTP API
ElasticsearchelasticsearchkNN vector search on dense_vector fields
OpenSearchopensearchkNN vector search with HNSW
QdrantqdrantPurpose-built vector database via REST API
WeaviateweaviateVector database with GraphQL search
PineconepineconeCloud-native vector database
Redis StackredisRediSearch module with vector similarity (requires Redis Stack)
PostgreSQLpostgresqlpgvector extension for cosine similarity search

Local store configuration

{
"provider": "local",
"config": {
"connection": {
"name": "my-store",
"session_id": "optional-session-id",
"init_content": "https://example.com/initial-data.json"
},
"options": {
"max_results": 3,
"min_score": 0.7
}
}
}
ParameterTypeDefaultDescription
connection.namestringStore name
connection.session_idstringOptional session ID for per-session isolation
connection.init_contentstringURL to initial content (HTTP/HTTPS, file://, or s3://)
options.max_resultsinteger3Maximum number of results returned by search
options.min_scorenumber0.7Minimum cosine similarity score for search matches

ChromaDB configuration

{
"provider": "chromadb",
"config": {
"connection": {
"url": "http://localhost:8000",
"collection": "my-collection",
"api_key": "optional-api-key",
"tenant": "default_tenant",
"database": "default_database"
}
}
}

Elasticsearch / OpenSearch configuration

{
"provider": "elasticsearch",
"config": {
"connection": {
"url": "http://localhost:9200",
"index": "embeddings",
"username": "elastic",
"password": "changeme",
"dims": 384,
"similarity": "cosine"
}
}
}

Authentication supports username/password (Basic), api_key (ApiKey header), or no auth. The index and mapping are created automatically on first use.

For OpenSearch, use "provider": "opensearch" with additional options engine (default lucene) and space_type (default cosinesimil).

Qdrant configuration

{
"provider": "qdrant",
"config": {
"connection": {
"url": "http://localhost:6333",
"collection": "my-collection",
"api_key": "optional-api-key",
"dims": 384,
"distance": "Cosine"
}
}
}

Weaviate configuration

{
"provider": "weaviate",
"config": {
"connection": {
"url": "http://localhost:8080",
"class_name": "Embedding",
"api_key": "optional-api-key"
}
}
}

Pinecone configuration

{
"provider": "pinecone",
"config": {
"connection": {
"url": "https://index-xxxxx.svc.environment.pinecone.io",
"api_key": "your-api-key",
"namespace": ""
}
}
}

Redis Stack configuration

Requires Redis with the Search module (redis-stack). Embeddings are stored as binary vectors in HASH keys and indexed with RediSearch for KNN similarity search.

{
"provider": "redis",
"config": {
"connection": {
"url": "redis://localhost:6379",
"prefix": "otoroshi:ai:emb",
"dims": 384,
"distance_metric": "COSINE"
}
}
}
ParameterTypeDefaultDescription
urlstringredis://localhost:6379Redis connection URI
prefixstringotoroshi:ai:embKey prefix for hash keys and index
dimsinteger384Vector dimensions
distance_metricstringCOSINEDistance metric (COSINE, L2, IP)

PostgreSQL (pgvector) configuration

Requires PostgreSQL with the pgvector extension. The extension and table are created automatically on first use.

{
"provider": "postgresql",
"config": {
"connection": {
"uri": "postgresql://user:password@localhost:5432/mydb",
"table": "otoroshi_ai_embeddings",
"dims": 384
}
}
}
ParameterTypeDefaultDescription
uristringpostgresql://otoroshi:otoroshi@localhost:5432/otoroshiPostgreSQL connection URI
tablestringotoroshi_ai_embeddingsTable name
dimsinteger384Vector dimensions