The build-or-no-build verdict: validating an AI idea in 10 days.

Almost every company now uses AI, and almost none of them can point to the money. That single gap, between adoption and value, is where six-month builds go to disappear. The fix is not a better model or a bigger budget. It is a validation discipline that produces a written verdict in ten working days, before the build is commissioned rather than after it stalls. The verdict is the deliverable. The build is what happens only if the verdict earns it.

The gap is not awareness.

The adoption argument is over. 88% of companies now use AI regularly in at least one function, yet only 6% qualify as high performers where AI contributes more than 5% of EBIT, on McKinsey's 2025 State of AI figures. The awareness curve is flat at the top. What separates that 6% from everyone else is not that they found AI later or spent more on it. It is that they validated before they scaled.

The rest are busy. Proofs of concept, pilots, a copilot bolted onto one team, a model humming behind a demo that impresses in the room and never reaches the shift. Activity is not the scarce thing. A written definition of what would count as proof, agreed before the work starts, is the scarce thing, and its absence is why so much motion produces so little EBIT.

“Most AI projects do not fail because the model was wrong. They fail because nobody wrote down what would count as proof.”

Why pilots die: the pilot loop.

Nearly two-thirds of AI-adopting organisations stay stuck in experiment or pilot mode without ever scaling, the pattern one analysis of the McKinsey data calls the pilot loop. A pilot shows a localised success, a clean demo on a clean slice of data, and then dies at the integration boundary where real inputs, real volume, and real exceptions live.

The loop is seductive because each turn feels like progress. A new pilot, a new model, a new slice of data, a new demo that lands. What never happens is the crossing: the moment the system meets the messy production workflow it was supposed to change. A pilot that is never asked to cross that boundary can run forever without ever being wrong, and without ever being worth anything. Breaking the loop means testing the crossing early, on purpose, with real data and a measurable rubric, not deferring it until the celebration is over.

The model works, the workflow cannot absorb it.

Only 21% of organisations using generative AI have redesigned any of their workflows; the rest layer AI on top of processes shaped for humans, on McKinsey's 2025 figures. This is the quiet failure mode behind a working model that still delivers nothing: the output is fine, and there is nowhere for it to go.

A support summariser that drafts a perfect escalation into a queue nobody reads has not saved anyone time. A lead-scoring model that ranks accounts a sales team has no capacity to work has produced a better-sorted backlog, not more revenue. The model was never the risk. The workflow's ability to absorb the model's output was, and it is the cheapest thing in the world to test first and the most expensive thing to discover last.

Field rule

If the pilot produces an output no existing workflow can act on, the model was never the risk. Validate the workflow's ability to absorb the output before you validate the model.

From product-market fit to proof of demand.

Modern validation has moved from product-market fit to proof of demand: behaviour-based commitment from real users, time, money, or reputation, rather than slide-deck enthusiasm. The shift matters because enthusiasm is free and commitment is not, and only one of them predicts what happens after launch.

For an AI idea the equivalent is concrete: real operator data in, a measurable rubric on the output, a written verdict at the end. The rubric is what turns a demo into evidence. Without it you are grading outputs by adjective, which is the same as not grading them; with it, the same disagreement that would have taken six weeks of debate resolves against a number the team agreed on in advance. This is the spine of eval-driven AI, and it is why the sprint writes the rubric before it writes the agent.

The ten-day verdict.

The Morvion read is a fixed shape. A ten-day discovery sprint scopes one validation question, ships working software against real data, scores the output against a measurable rubric, and returns one of three written verdicts.

Build. The evidence clears the bar, and the verdict comes with a costed plan for the real thing. The sprint has de-risked the investment, not just approved it.
Do not build. The evidence is in, and it says no. This is a result, not a failure: it prevents the six-month build that would have reached the same answer with far more sunk cost.
Build differently. The original question was the wrong one, and the sprint returns the reframe: a sharper problem the evidence says is worth solving, and the shape of the thing that would solve it.

All three are worth the engagement, and none of them takes six months to learn. The same shape runs whether the question is a customer-facing copilot, a workflow redesign, or a CRM intelligence layer; the structure is identical, only the fixtures change. Our Labs and Prototyping engagements exist to produce exactly this, and they start from a validation question, not a build brief.

The reference

The three verdicts are not equal in comfort, but they are equal in value. The most expensive outcome is the fourth one nobody schedules: a six-month build that ends at the same answer a ten-day sprint would have produced, with a great deal more sunk cost.

Common questions.

What separates a discovery sprint from a hackathon?
A sprint produces a written verdict, the evidence behind it, and a costed next phase. A hackathon produces a demo. Different artifacts, different commitments: one is built to end a debate, the other to start one.

What if the verdict is “do not build”?
That is the most valuable verdict the sprint can produce. It prevents the six-month build that ends in the same answer with far more sunk cost, and it frees the budget for the bet that will actually pay. A cheap no is worth more than an expensive maybe.

Who should commission a discovery sprint?
Anyone weighing an AI investment they cannot easily reverse: a new product surface, a workflow redesign, a CRM intelligence layer, a customer-facing copilot. If the decision is small and reversible, skip the sprint and just try it. If it is large and hard to undo, ten days of evidence is the cheapest insurance available.

If you have an AI idea you cannot easily reverse, tell us the one question that would settle it and we will scope the sprint that answers it; the two-week proof shape is the same structure, written up in full.

The build-or-no-build verdict: validating an AI idea in 10 days.

The gap is not awareness.

Why pilots die: the pilot loop.

The model works, the workflow cannot absorb it.

From product-market fit to proof of demand.

The ten-day verdict.

Common questions.

Edward Salvatierra

We design and ship the systems we write about.

The build-or-no-build verdict: validating an AI idea in 10 days.

The gap is not awareness.

Why pilots die: the pilot loop.

The model works, the workflow cannot absorb it.

From product-market fit to proof of demand.

The ten-day verdict.

Common questions.

Edward Salvatierra

Three more field notes.

What makes a dashboard useful, not just beautiful.

What an AI operator actually does between 9 and 5.

The Swiss founder-operator: why templates and freelancers stop working.

We design and ship the systems we write about.