A model router is a small classifier that inspects each incoming query and dispatches it to the right model or workflow. Easy queries route to a small fast model; complex queries route to a large model. The router itself runs in milliseconds and costs essentially nothing per call — but can cut overall AI cost by 60–80% on workloads with a mix of difficulties.

Routing signals.

  • Query length and complexity — short factual queries often go to a small model; long multi-step reasoning to a large one.
  • Required tools — queries that need code execution, web access, or structured planning route to an agentic workflow with tool use; queries that need none route to a flat call.
  • Confidence-based escalation — the small model answers first; if its self-reported confidence (or a fast judge score) is below threshold, escalate to the large model.

Anatomy.

The router is typically a small classifier — Haiku-class for LLM-based routing, or a fine-tuned cross-encoder for cheaper dispatch. Inputs: the query plus any session metadata. Output: one of N route labels, with confidence. Routes map to specific workflows in the orchestration layer.

Why this pattern wins.

Production AI workloads are bimodal: easy queries dominate volume, hard queries dominate cost. Without routing, every query pays large-model price for small-model work. With routing, the cost distribution matches the difficulty distribution. On most workloads we've audited, a router pays back its implementation cost inside the first month.

Frequently asked.

What is a model router?
A model router is a small classifier that inspects each incoming query and dispatches it to the right model or workflow — small/fast model for easy queries, large/expensive model only when the task requires it. It cuts cost 60–80% on workloads with mixed query difficulty.
How accurate does the router need to be?
Above 95% routing accuracy on a representative fixture set. Below that, misrouting starts producing visible quality regressions on the small-model path. The eval harness measures both routing accuracy and the downstream quality on each route, and the regression gate fails on either.
Should I build the router or use a hosted one?
Hosted routers (Martian, RouteLLM, Vercel AI Gateway) are a good starting point. They cover the common patterns. Build your own when you have workflow-specific routes (e.g. dispatch to internal agents) or strict cost targets that need custom optimization. The build-or-buy decision is a regular Morvion engagement question.
What's the difference between a model router and an agent dispatcher?
Same idea, different scope. A model router picks between models for a single LLM call. An agent dispatcher picks between agents (or workflows) — each of which may make many model calls. The model router is one layer of a broader orchestration; both can coexist.