Load Balancing

Load balancing distributes LLM requests across multiple providers, optimizing performance, ensuring availability, and preventing any single provider from being overloaded.

How it works

The load balancer is a virtual provider of type loadbalancer. Instead of connecting directly to an LLM API, it wraps multiple real providers and routes each request to one of them according to a configurable strategy.

It supports all LLM operations: chat completion (blocking and streaming) and text completion (blocking and streaming).

Creating a load balancer provider

Create a new LLM Provider entity with provider set to loadbalancer. The connection object is not used. All configuration goes into the options object.

{
  "id": "provider_lb_1",
  "name": "My Load Balancer",
  "provider": "loadbalancer",
  "connection": {},
  "options": {
    "refs": [
      "provider_openai_1",
      "provider_anthropic_1",
      "provider_mistral_1"
    ],
    "loadbalancing": "round_robin"
  }
}

Configuration

Parameter	Type	Default	Description
`refs`	array	`[]`	List of provider references (see below)
`loadbalancing`	string	`round_robin`	Load balancing strategy: `round_robin`, `random`, or `best_response_time`
`selector_expr`	string	—	Expression to filter providers based on request content (see below)

Provider references

The refs array can contain either simple provider IDs (strings) or objects with weight and selector:

Simple format — each provider has equal weight:

{
  "refs": [
    "provider_openai_1",
    "provider_anthropic_1"
  ]
}

Weighted format — control traffic distribution with weights:

{
  "refs": [
    { "ref": "provider_openai_1", "weight": 3 },
    { "ref": "provider_anthropic_1", "weight": 1 }
  ]
}

In this example, OpenAI would receive approximately 75% of requests and Anthropic 25%.

With selector — filter providers based on request content:

{
  "refs": [
    { "ref": "provider_openai_1", "weight": 1, "selector_expected": "openai" },
    { "ref": "provider_anthropic_1", "weight": 1, "selector_expected": "anthropic" }
  ],
  "selector_expr": "model"
}

Load balancing strategies

Round Robin

Distributes requests sequentially across all providers in order. Each provider gets an equal share of traffic (adjusted by weights).

{
  "loadbalancing": "round_robin"
}

Random

Selects a random provider for each request. Over time, traffic is evenly distributed (adjusted by weights).

{
  "loadbalancing": "random"
}

Best Response Time

Tracks the average response time of each provider and routes requests to the provider with the lowest average. New providers (with no history) are tried first.

{
  "loadbalancing": "best_response_time"
}

Request-based routing with `selector_expr`

The selector_expr option allows you to route requests to specific providers based on the content of the request body. This is useful when you want to direct requests to different providers based on the model requested or other parameters.

Field name

Use a simple field name to match against the request body merged with the provider options:

{
  "selector_expr": "model",
  "refs": [
    { "ref": "provider_openai_1", "weight": 1, "selector_expected": "gpt-4o" },
    { "ref": "provider_anthropic_1", "weight": 1, "selector_expected": "claude-sonnet-4-20250514" }
  ]
}

A request with "model": "gpt-4o" will be routed to the OpenAI provider, while "model": "claude-sonnet-4-20250514" will go to Anthropic.

JsonPointer

Use JsonPointer(...) for nested fields:

{
  "selector_expr": "JsonPointer(/metadata/provider)"
}

JsonPath

Use JsonPath(...) for complex path expressions:

{
  "selector_expr": "JsonPath($.metadata.provider)"
}

Wildcard

Use * to disable filtering (select from all providers):

{
  "selector_expr": "*"
}

Full example

A load balancer with weighted providers and best response time strategy:

{
  "id": "provider_lb_production",
  "name": "Production LB",
  "description": "Load balancer across OpenAI and Mistral with failover",
  "provider": "loadbalancer",
  "connection": {},
  "options": {
    "refs": [
      { "ref": "provider_openai_prod", "weight": 2 },
      { "ref": "provider_mistral_prod", "weight": 1 }
    ],
    "loadbalancing": "best_response_time"
  },
  "provider_fallback": "provider_anthropic_backup"
}

This configuration:

Routes requests to OpenAI (2/3) and Mistral (1/3) based on best response time
Falls back to Anthropic if the selected provider fails (see Fallback)

How it works​

Creating a load balancer provider​

Configuration​

Provider references​

Load balancing strategies​

Round Robin​

Random​

Best Response Time​

Request-based routing with selector_expr​

Field name​

JsonPointer​

JsonPath​

Wildcard​

Full example​