An agent handoff is the structured transfer of work from one AI agent to another. In any multi-agent workflow that has more than one role, the handoff between roles is where most of the failure modes live. Treating handoffs as typed contracts — not prose instructions — collapses the failure rate.

The handoff contract.

Every handoff has three explicit parts:

  • Input shape. The typed object the receiving agent expects. Defined as a TypeScript interface or JSON Schema, not as a paragraph in a prompt.
  • Expected output. What the receiving agent is supposed to produce. Also typed, also schema-enforced.
  • Success criterion. The rubric the next agent in the chain (or the eval harness) uses to score whether this handoff actually produced what was needed.

Why typed contracts beat prose.

Prose handoffs ("the planner should give the executor enough context to act") sound reasonable until you debug one. Each agent interprets the prose slightly differently and the system silently produces wrong outputs. Typed contracts make ambiguity a compile-time error. They also make handoff quality an eval rubric the harness can score automatically.

When not to use multi-agent handoffs.

If the workflow is linear and every step has the same role (transform, filter, output), use a single agent with deterministic function calls, not a multi-agent handoff. Adding handoffs to a linear pipeline introduces latency, cost, and three new places for failure without solving anything. The handoff pattern earns its complexity only when roles genuinely differ; see the multi-agent workflows article for the split-or-not decision tree.

Frequently asked.

What is an agent handoff?
An agent handoff is the structured transfer of work from one AI agent to another — for example a planner passing a plan to an executor, or a router dispatching to a specialist. The handoff is a typed contract with explicit input shape, expected output, and success criterion, rather than a prose instruction the receiving agent has to interpret.
Why are handoffs the failure point in multi-agent workflows?
Because most teams write handoffs as prose. Prose handoffs sound reasonable until you debug one: each agent interprets the words differently and the system silently produces wrong outputs. Typed contracts fix this by making ambiguity a compile-time error and by making handoff quality something the eval harness can score.
How do I score handoff quality in an eval harness?
Define the handoff contract's success criterion as a rubric (deterministic when the expected output is structural, LLM-graded when it's feel-based). The harness scores each agent's output against the contract before it's passed downstream. Bad handoffs show up as regressions in the upstream agent's metrics, not as mysterious downstream failures.
Should every multi-agent system use typed handoffs?
Yes. The cost of writing a TypeScript interface for the handoff is small; the cost of debugging a prose-only handoff in production is enormous. Every Morvion multi-agent engagement defines handoffs as typed contracts and scores them in the eval harness.