Fine-grained routing is the production pattern where each step in a workflow picks its own model. A retrieval- summarisation step that just compresses 5k tokens of context runs on a small fast model. The reasoning step that decides the next action runs on a large model. The classification step that triages an input runs on a Haiku-class model. Where model routing picks a model per request, fine-grained routing picks one per step.

Why per-step matters.

A typical agentic workflow runs 4–10 model calls per request. The cheapest call and the most expensive call often differ by 30× in cost. Routing each step to the smallest model that still passes its rubric cuts overall spend dramatically — without lowering the quality of the step that actually needs the big model.

How to size each step.

  • Per-step eval. The fixture set scores each step independently. The smallest model that hits the per-step rubric wins.
  • Confidence escalation. Cheap steps run a fast model and self-report confidence; below threshold, escalate to a larger model. The eval keeps the escalation rate honest.
  • Latency budget per step. Long-context steps and reasoning steps have different latency budgets. Model choice respects both the quality and the budget.

Pitfalls.

The biggest mistake is treating every step the same and paying large-model price for small-model work. The second is escalating too aggressively (every cheap step retries on the large model, so the savings disappear). The eval harness, with per-step quality and cost both scored, is what keeps both errors from compounding silently.

Frequently asked.

What is fine-grained routing?
Fine-grained routing is the production pattern of picking a different model for each step in a workflow rather than a single model per request. Cheap steps run on small fast models; expensive steps run on large models. The cost distribution matches the difficulty distribution of the workflow.
How is fine-grained routing different from a model router?
A model router picks one model per request based on query difficulty. Fine-grained routing picks a model per step within a single workflow. Both layers can coexist; many production systems use the router at the request level and fine-grained routing inside multi-step workflows.
What cost savings are realistic?
On workflows with mixed step difficulty (most agentic workflows), fine-grained routing cuts model spend by 40–70% versus running every step on a large model. The savings depend on the difficulty distribution; workflows with one hard step and nine cheap steps benefit most.
How do we keep step quality high under fine-grained routing?
Per-step eval. Score each step against its own rubric on a labelled fixture set. The smallest model that still passes the rubric wins. The eval harness keeps the escalation rate visible; without it, fine-grained routing degrades silently and the savings vanish into retry cost.