Model fallback is the production pattern of routing the request to a secondary model when the primary fails, refuses unexpectedly, breaches a token budget, or returns malformed structured output that doesn't recover on retry. It is the most cost-effective availability investment a production AI system makes.
Trigger conditions.
- Provider error. 5xx, rate-limit, timeout, or connection failure on the primary endpoint.
- Structured-output failure. The primary returned output that fails schema validation twice.
- Refusal. The primary refused unexpectedly on an input the workflow expects to handle.
- Budget breach. The primary would exceed the workflow's per-request token budget.
How to wire it.
At the model gateway, not in application code. The Vercel AI Gateway, OpenRouter, and most production gateways ship fallback rules as configuration. Define the cascade there (e.g. Claude → GPT → Gemini), the trigger conditions, and the per-fallback budget. The application calls the gateway and stays unaware of which model actually answered.
Eval the fallback path separately.
A fallback path that nobody evaluates is a liability. Add fixtures that force the fallback (mock-failed primaries, schema-breaking primaries) and score the fallback model against the same rubric. The release gate fails if the fallback path quality drops past tolerance.
Frequently asked.
- What is model fallback?
- Model fallback is the production pattern of routing to a secondary model when the primary fails, refuses unexpectedly, breaches a budget, or returns malformed structured output. It is wired at the model gateway as configuration, not in application code. A single provider outage no longer takes the workflow down.
- What triggers a fallback?
- Four common conditions: provider error (5xx, rate-limit, timeout), structured-output failure twice on retry, unexpected refusal on an in-scope input, and budget breach where the primary would exceed the per-request token cap. The gateway evaluates the conditions; the application stays unaware of which model answered.
- Does the fallback model need its own eval?
- Yes. A fallback path that nobody evaluates is a liability. Add fixtures that force the fallback and score it against the same rubric. The release gate fails if the fallback path quality regresses. Most production teams under-eval the fallback; most production incidents involve the fallback path.
- Where does fallback live, in the app or the gateway?
- The gateway. Vercel AI Gateway, OpenRouter, and similar shipping-grade gateways carry fallback rules as configuration. Application code calls the gateway and trusts that the failover happens transparently. Fallback rules in application code spread the policy across the codebase; in the gateway they sit in one place and are auditable.