Skip to main content

Resilience

Resilience ensures that LLM interactions remain highly available and fault-tolerant, even when providers experience outages or failures.

Key Features

  • Load Balancing: Distribute requests across multiple providers to optimize performance.
  • Fallback Mechanism: Automatically switch to alternative LLMs in case of failures.
  • Automatic Retries: Retry failed requests based on configurable rules to maximize successful responses.
  • Rate Limiting & Quotas: Prevent overloading a single provider by distributing usage effectively.

Provider fallback

Requests are first sent to a primary LLM provider. If the request fails, the system retries or falls back to a secondary provider.

fallback

Provider loadbalancing

Load balancing ensures even distribution, reducing provider overload.

lb