What is fine-grained routing?

Fine-grained routing is the production pattern of picking a different model for each step in a workflow rather than a single model per request. Cheap steps run on small fast models; expensive steps run on large models. The cost distribution matches the difficulty distribution of the workflow.

How is fine-grained routing different from a model router?

A model router picks one model per request based on query difficulty. Fine-grained routing picks a model per step within a single workflow. Both layers can coexist; many production systems use the router at the request level and fine-grained routing inside multi-step workflows.

What cost savings are realistic?

On workflows with mixed step difficulty (most agentic workflows), fine-grained routing cuts model spend by 40–70% versus running every step on a large model. The savings depend on the difficulty distribution; workflows with one hard step and nine cheap steps benefit most.

How do we keep step quality high under fine-grained routing?

Per-step eval. Score each step against its own rubric on a labelled fixture set. The smallest model that still passes the rubric wins. The eval harness keeps the escalation rate visible; without it, fine-grained routing degrades silently and the savings vanish into retry cost.

Fine-grained routing · Morvion Glossary

Fine-grained routing is the production pattern where each step in a workflow picks its own model. A retrieval- summarisation step that just compresses 5k tokens of context runs on a small fast model. The reasoning step that decides the next action runs on a large model. The classification step that triages an input runs on a Haiku-class model. Where model routing picks a model per request, fine-grained routing picks one per step.

Why per-step matters.

A typical agentic workflow runs 4–10 model calls per request. The cheapest call and the most expensive call often differ by 30× in cost. Routing each step to the smallest model that still passes its rubric cuts overall spend dramatically — without lowering the quality of the step that actually needs the big model.

How to size each step.

Per-step eval. The fixture set scores each step independently. The smallest model that hits the per-step rubric wins.
Confidence escalation. Cheap steps run a fast model and self-report confidence; below threshold, escalate to a larger model. The eval keeps the escalation rate honest.
Latency budget per step. Long-context steps and reasoning steps have different latency budgets. Model choice respects both the quality and the budget.

Pitfalls.

The biggest mistake is treating every step the same and paying large-model price for small-model work. The second is escalating too aggressively (every cheap step retries on the large model, so the savings disappear). The eval harness, with per-step quality and cost both scored, is what keeps both errors from compounding silently.

Frequently asked.

What is fine-grained routing?: Fine-grained routing is the production pattern of picking a different model for each step in a workflow rather than a single model per request. Cheap steps run on small fast models; expensive steps run on large models. The cost distribution matches the difficulty distribution of the workflow.
How is fine-grained routing different from a model router?: A model router picks one model per request based on query difficulty. Fine-grained routing picks a model per step within a single workflow. Both layers can coexist; many production systems use the router at the request level and fine-grained routing inside multi-step workflows.
What cost savings are realistic?: On workflows with mixed step difficulty (most agentic workflows), fine-grained routing cuts model spend by 40–70% versus running every step on a large model. The savings depend on the difficulty distribution; workflows with one hard step and nine cheap steps benefit most.
How do we keep step quality high under fine-grained routing?: Per-step eval. Score each step against its own rubric on a labelled fixture set. The smallest model that still passes the rubric wins. The eval harness keeps the escalation rate visible; without it, fine-grained routing degrades silently and the savings vanish into retry cost.

Fine-grained routing

Why per-step matters.

How to size each step.

Pitfalls.

Frequently asked.

Intelligent Systems & AI Infrastructure

Keep reading the glossary.

AI infrastructure

CRM intelligence

Immersive website

AI agent

Business intelligence dashboard

Client portal

Discovery sprint

Digital operating layer

Document intelligence

Eval-driven AI

Hospitality website

Marketplace platform

Multi-agent workflow

Real-time dashboard

Retrieval-augmented generation (RAG)

Prompt engineering

Vector database

AI observability

Embedding model

Fine-tuning

Vector search

Semantic search

Hallucination

Chain-of-thought

Function calling

Model distillation

Safety rails

Eval harness

Regression gate

Model Context Protocol (MCP)

Structured output

Agent tool use

Prompt injection

Agentic search

Observability traces

LLM guardrails

Agent handoff

Vector index

Token budget

Retrieval rerank

Embedding space

Semantic cache

Context window

Faithfulness

Cross-encoder

Model router

AI cost control

Agent memory

Structured extraction

AI evaluation framework

Retrieval quality

AI guardrail policy

Eval fixture

Eval rubric

AI incident

Agent orchestration

Eval versioning

Model fallback

AI policy version control