What is model fallback?

Model fallback is the production pattern of routing to a secondary model when the primary fails, refuses unexpectedly, breaches a budget, or returns malformed structured output. It is wired at the model gateway as configuration, not in application code. A single provider outage no longer takes the workflow down.

What triggers a fallback?

Four common conditions: provider error (5xx, rate-limit, timeout), structured-output failure twice on retry, unexpected refusal on an in-scope input, and budget breach where the primary would exceed the per-request token cap. The gateway evaluates the conditions; the application stays unaware of which model answered.

Where does fallback live, in the app or the gateway?

The gateway. Vercel AI Gateway, OpenRouter, and similar shipping-grade gateways carry fallback rules as configuration. Application code calls the gateway and trusts that the failover happens transparently. Fallback rules in application code spread the policy across the codebase; in the gateway they sit in one place and are auditable.

Model fallback · Morvion Glossary

Model fallback is the production pattern of routing the request to a secondary model when the primary fails, refuses unexpectedly, breaches a token budget, or returns malformed structured output that doesn't recover on retry. It is the most cost-effective availability investment a production AI system makes.

Trigger conditions.

Provider error. 5xx, rate-limit, timeout, or connection failure on the primary endpoint.
Structured-output failure. The primary returned output that fails schema validation twice.
Refusal. The primary refused unexpectedly on an input the workflow expects to handle.
Budget breach. The primary would exceed the workflow's per-request token budget.

How to wire it.

At the model gateway, not in application code. The Vercel AI Gateway, OpenRouter, and most production gateways ship fallback rules as configuration. Define the cascade there (e.g. Claude → GPT → Gemini), the trigger conditions, and the per-fallback budget. The application calls the gateway and stays unaware of which model actually answered.

Eval the fallback path separately.

A fallback path that nobody evaluates is a liability. Add fixtures that force the fallback (mock-failed primaries, schema-breaking primaries) and score the fallback model against the same rubric. The release gate fails if the fallback path quality drops past tolerance.

Frequently asked.

What is model fallback?: Model fallback is the production pattern of routing to a secondary model when the primary fails, refuses unexpectedly, breaches a budget, or returns malformed structured output. It is wired at the model gateway as configuration, not in application code. A single provider outage no longer takes the workflow down.
What triggers a fallback?: Four common conditions: provider error (5xx, rate-limit, timeout), structured-output failure twice on retry, unexpected refusal on an in-scope input, and budget breach where the primary would exceed the per-request token cap. The gateway evaluates the conditions; the application stays unaware of which model answered.
Does the fallback model need its own eval?: Yes. A fallback path that nobody evaluates is a liability. Add fixtures that force the fallback and score it against the same rubric. The release gate fails if the fallback path quality regresses. Most production teams under-eval the fallback; most production incidents involve the fallback path.
Where does fallback live, in the app or the gateway?: The gateway. Vercel AI Gateway, OpenRouter, and similar shipping-grade gateways carry fallback rules as configuration. Application code calls the gateway and trusts that the failover happens transparently. Fallback rules in application code spread the policy across the codebase; in the gateway they sit in one place and are auditable.

Model fallback

Trigger conditions.

How to wire it.

Eval the fallback path separately.

Frequently asked.

Intelligent Systems & AI Infrastructure

Keep reading the glossary.

AI infrastructure

CRM intelligence

Immersive website

AI agent

Business intelligence dashboard

Client portal

Discovery sprint

Digital operating layer

Document intelligence

Eval-driven AI

Hospitality website

Marketplace platform

Multi-agent workflow

Real-time dashboard

Retrieval-augmented generation (RAG)

Prompt engineering

Vector database

AI observability

Embedding model

Fine-tuning

Vector search

Semantic search

Hallucination

Chain-of-thought

Function calling

Model distillation

Safety rails

Eval harness

Regression gate

Model Context Protocol (MCP)

Structured output

Agent tool use

Prompt injection

Agentic search

Observability traces

LLM guardrails

Agent handoff

Vector index

Token budget

Retrieval rerank

Embedding space

Semantic cache

Context window

Faithfulness

Cross-encoder

Model router

AI cost control

Agent memory

Structured extraction

AI evaluation framework

Retrieval quality

AI guardrail policy

Eval fixture

Eval rubric

AI incident

Agent orchestration

Eval versioning

Fine-grained routing

AI policy version control