What is a model router?

A model router is a small classifier that inspects each incoming query and dispatches it to the right model or workflow — small/fast model for easy queries, large/expensive model only when the task requires it. It cuts cost 60–80% on workloads with mixed query difficulty.

How accurate does the router need to be?

Above 95% routing accuracy on a representative fixture set. Below that, misrouting starts producing visible quality regressions on the small-model path. The eval harness measures both routing accuracy and the downstream quality on each route, and the regression gate fails on either.

Should I build the router or use a hosted one?

Hosted routers (Martian, RouteLLM, Vercel AI Gateway) are a good starting point. They cover the common patterns. Build your own when you have workflow-specific routes (e.g. dispatch to internal agents) or strict cost targets that need custom optimization. The build-or-buy decision is a regular Morvion engagement question.

What's the difference between a model router and an agent dispatcher?

Same idea, different scope. A model router picks between models for a single LLM call. An agent dispatcher picks between agents (or workflows) — each of which may make many model calls. The model router is one layer of a broader orchestration; both can coexist.

Model router · Morvion Glossary

A model router is a small classifier that inspects each incoming query and dispatches it to the right model or workflow. Easy queries route to a small fast model; complex queries route to a large model. The router itself runs in milliseconds and costs essentially nothing per call — but can cut overall AI cost by 60–80% on workloads with a mix of difficulties.

Routing signals.

Query length and complexity — short factual queries often go to a small model; long multi-step reasoning to a large one.
Required tools — queries that need code execution, web access, or structured planning route to an agentic workflow with tool use; queries that need none route to a flat call.
Confidence-based escalation — the small model answers first; if its self-reported confidence (or a fast judge score) is below threshold, escalate to the large model.

Anatomy.

The router is typically a small classifier — Haiku-class for LLM-based routing, or a fine-tuned cross-encoder for cheaper dispatch. Inputs: the query plus any session metadata. Output: one of N route labels, with confidence. Routes map to specific workflows in the orchestration layer.

Why this pattern wins.

Production AI workloads are bimodal: easy queries dominate volume, hard queries dominate cost. Without routing, every query pays large-model price for small-model work. With routing, the cost distribution matches the difficulty distribution. On most workloads we've audited, a router pays back its implementation cost inside the first month.

Frequently asked.

What is a model router?: A model router is a small classifier that inspects each incoming query and dispatches it to the right model or workflow — small/fast model for easy queries, large/expensive model only when the task requires it. It cuts cost 60–80% on workloads with mixed query difficulty.
How accurate does the router need to be?: Above 95% routing accuracy on a representative fixture set. Below that, misrouting starts producing visible quality regressions on the small-model path. The eval harness measures both routing accuracy and the downstream quality on each route, and the regression gate fails on either.
Should I build the router or use a hosted one?: Hosted routers (Martian, RouteLLM, Vercel AI Gateway) are a good starting point. They cover the common patterns. Build your own when you have workflow-specific routes (e.g. dispatch to internal agents) or strict cost targets that need custom optimization. The build-or-buy decision is a regular Morvion engagement question.
What's the difference between a model router and an agent dispatcher?: Same idea, different scope. A model router picks between models for a single LLM call. An agent dispatcher picks between agents (or workflows) — each of which may make many model calls. The model router is one layer of a broader orchestration; both can coexist.

Model router

Routing signals.

Anatomy.

Why this pattern wins.

Frequently asked.

Intelligent Systems & AI Infrastructure

Keep reading the glossary.

AI infrastructure

CRM intelligence

Immersive website

AI agent

Business intelligence dashboard

Client portal

Discovery sprint

Digital operating layer

Document intelligence

Eval-driven AI

Hospitality website

Marketplace platform

Multi-agent workflow

Real-time dashboard

Retrieval-augmented generation (RAG)

Prompt engineering

Vector database

AI observability

Embedding model

Fine-tuning

Vector search

Semantic search

Hallucination

Chain-of-thought

Function calling

Model distillation

Safety rails

Eval harness

Regression gate

Model Context Protocol (MCP)

Structured output

Agent tool use

Prompt injection

Agentic search

Observability traces

LLM guardrails

Agent handoff

Vector index

Token budget

Retrieval rerank

Embedding space

Semantic cache

Context window

Faithfulness

Cross-encoder

AI cost control

Agent memory

Structured extraction

AI evaluation framework

Retrieval quality

AI guardrail policy

Eval fixture

Eval rubric

AI incident

Agent orchestration

Eval versioning

Model fallback

Fine-grained routing

AI policy version control