Small operating teams have a CRM problem and don't know it. They have HubSpot or Pipedrive, but no one updates it, no one trusts it, and no one acts on it. The CRM stopped being a system years ago and became a graveyard for once- promising leads. The cure is not another tool, it is a layer of role-scoped agents, Scout, Scribe, Sentinel, that read the same data the operator does and surface the next action with a draft already attached. This report is the reference for how that layer is built.
The three-agent model.
Every working CRM intelligence system Morvion has shipped converges on the same three-agent architecture. Names vary across engagements; functions do not. The shape is a multi-agent workflow with typed handoffs between roles.
Agent 01 · Scout. Enrichment.
Scout reads every CRM record and builds a context block: company signals, recent activity, mutual connections, what the prospect shipped last quarter, who else they're talking to. Output is structured (a typed object the next agent can consume), not prose. Refreshed weekly per account, or on event triggers (a deal stage change, a fresh signal from the firmographic API).
The signal Scout is broken: the operator opens a record and immediately opens LinkedIn. If the context block didn't replace the LinkedIn tab, Scout is failing its job description.
Agent 02 · Scribe. Drafting.
Scribe writes the first version of every outreach, follow-up, and proposal. Not generic; context-aware, built on top of Scout's enrichment plus the operator's voice profile plus the deal's history. Every draft is run through structured output constraints (schema-validated subject + body + CTA) and graded against the rubric before the operator sees it.
The signal Scribe is broken: the operator rewrites more than 40% of the draft before sending. If they're starting from scratch most of the time, the draft adds no value over a blank canvas.
Agent 03 · Sentinel. Monitoring.
Sentinel watches the pipeline for behavioural shifts and surfaces the next-action shortlist each morning. Deals that just went quiet. Deals about to surprise-close. Deals leaking a signal nobody saw. Sentinel never moves cards; it produces a ranked list with drafts already attached, the operator approves or redirects.
The signal Sentinel is broken: a deal goes quiet and the operator hears about it from the customer instead of from the dashboard.
“Your CRM isn't broken. It's just alone.”
The retrieval layer underneath.
All three agents read from the same retrieval layer: CRM records, prior conversations, transcripts, internal docs, and a curated set of external signals. The architecture is retrieval-augmented generation with a two-stage retrieve-then-rerank pattern: a vector database returns the top-50 relevant chunks, rerank compresses to the top-10, the agent generates against those. Retrieval quality is measured separately from generation faithfulness, different fixtures, different rubrics, different regression gates.
Eval rubric per agent.
Each agent ships with its own eval harness. The fixture set is per-agent (50–200 real examples, labelled). The rubric is per-agent. The regression gate fails the release if any agent regresses past tolerance.
Scout │ enrichment-accuracy ≥ 0.92 · staleness-cap 7 days
│ structured-output validity 100% · cost ≤ $0.03/record
Scribe │ draft-quality ≥ 4.2/5 (LLM-graded) · tone-match ≥ 0.90
│ schema validity 100% · faithfulness ≥ 0.95
Sentinel │ signal-precision ≥ 0.80 · signal-recall ≥ 0.75
│ next-action relevance ≥ 0.85 · time-to-surface ≤ 4hThe numbers above are starting targets from Morvion engagements; the actual thresholds are tuned per workload during the Discovery Sprint and re-baselined after the first month of live data.
Reference architectures · three scales.
Reference 01 · Solo founder + small assistant team.
The smallest viable stack. One operator, one CRM, three agents. Built so the founder runs their entire pipeline from one next-action dashboard, no second tab open.
Scout ─── enrichment on every new lead + weekly refresh on actives
Scribe ─── drafts every outbound + every follow-up, one-click send
Sentinel ─── morning shortlist (5–7 items), with drafts attached
Surface ──── single web dashboard, mobile-readable
Eval ─────── per-agent harness, ~80 fixtures each, weekly regression check
Cost band ── €30–60k build, €0.5–1.5k/month run
Build time ─ 6–10 weeksReference 02 · 3–15 person sales team.
Multiple operators, shared pipeline, per-rep voice profiles for Scribe. Sentinel surfaces by rep AND by account, so the manager sees the team view and each rep sees their own. Memory boundaries are explicit: one rep's notes never leak into another's draft.
Scout ─── shared enrichment + per-rep relationship history
Scribe ─── per-rep voice profile + drafts route to the right operator
Sentinel ─── team view + per-rep view, with manager-only surfaces
Surface ──── web + Slack integration for surfacing high-priority items
Eval ─────── per-agent harness + per-rep voice eval (style drift)
Cost band ── €50–110k build, €1.5–4k/month run
Build time ─ 8–12 weeks
Pattern ──── one rep ships first, the rest onboard in week-long passesReference 03 · Multi-tenant / multi-region sales org.
Multiple teams, shared infrastructure, per-tenant boundaries. Each team has its own retrieval index, its own voice profile, its own dashboard. The eval harness and observability layer are shared infrastructure. Regulatory boundaries (data residency, consent rules) are honoured at the retrieval layer.
Scout ─── per-tenant retrieval index + global enrichment APIs
Scribe ─── per-tenant voice profiles + per-jurisdiction templates
Sentinel ─── per-tenant alerts + cross-tenant aggregate (anonymised)
Surface ──── web dashboards per team, admin console for the org
Eval ─────── shared harness infra, per-tenant fixtures + thresholds
Cost band ── €90–250k build, €4–12k/month run
Build time ─ 12–20 weeks
Pattern ──── pilot one tenant end-to-end, then onboard in 2-week passesMorvion has shipped reference 01 and reference 02 in live engagements as of 2026-05. Reference 03 is the forward-projection of the same shape; the architecture is identical, the boundary work compounds with the number of tenants.
The dashboard surface.
The output of the three-agent system is not the pipeline view. It is the next-action view: five to seven items, ranked by impact, each with a draft attached, each with a one-click approve and a one-line revise. The operator opens the dashboard in the morning, scans, approves three, redirects two, done in twelve minutes.
The dashboard is also where AI observability surfaces: every agent decision is replayable, every draft has a version trail, every Sentinel signal links to the retrieved context that triggered it. The operator never wonders why the system did what it did.
Cost bands · what to expect.
Public ranges from Morvion engagements in 2026, all-in (audit + design + engineering + launch), excluding subscription tools the team continues to pay for (CRM license, model API spend, third-party enrichment APIs).
Discovery Sprint ───── €18–25k · 2 weeks · validates scope + eval design
Solo / small team ─── €30–60k · 6–10 weeks
3–15 person team ──── €50–110k · 8–12 weeks
Multi-tenant org ──── €90–250k · 12–20 weeks
Retainer (post-launch) €4–18k/month · ongoing iteration + eval refreshThe wider end of each band is driven by integrations: CRMs with non-standard APIs, legacy enrichment vendors, custom voice profiles per rep, multi-language drafting, and any compliance layer (financial services, healthcare, regulated regions) that demands extra audit work.
The 12-question self-audit scorecard.
For an operator to assess their current CRM. One point per affirmative answer; max 12. Below 6 = a Discovery Sprint will surface faster wins than another tool subscription. Below 4 = the operator is the CRM and is the bottleneck.
- Does every CRM record have a fresh enrichment block (under 14 days old)?
- Can the operator open a record and not need to open LinkedIn afterward?
- Does the operator receive a ranked next-action list each morning, not a pipeline view?
- Are outbound drafts pre-written, or does the operator start from a blank canvas?
- Does the team know about a deal going quiet within four hours, not four weeks?
- Is there a written eval rubric per agent, with a baseline score per release?
- Can the operator replay why the system surfaced any specific item?
- Are per-rep voice profiles honoured (drafts read like the rep, not like the model)?
- Does the system respect data boundaries (one rep's notes don't leak into another's draft)?
- Is the cost per record + cost per draft tracked alongside the engagement metrics?
- Does the regression gate block releases that lower any agent rubric past tolerance?
- Does the CRM stay the source of truth (no data migration required)?
What not to build.
The mistakes we see most often. Skip these even when the vendor demo is compelling.
- A chatbot in the CRM corner. Single-turn chat with the model is a side-quest, not the operating layer. It costs trust and adds workflow steps.
- Static weekly lead scores. A batch job that writes a 0–100 number once a week is not intelligence; it's a number the operator stops checking after week three.
- One agent for everything. A single prompt trying to enrich + draft + monitor regresses across model updates and can't be evaluated. Three role-scoped agents are easier to maintain than one that tries to do all three.
- Drafting without faithfulness checks. Scribe without faithfulness grading regresses to fluent-but-wrong. The brand cost is invisible until the customer notices.
- Replacing the CRM. The intelligence layer reads and writes; it doesn't migrate. Any vendor that wants you to switch CRMs to use their AI is selling lock-in, not intelligence.
Worked example · the three-engagement repeat.
The same three-agent shape has shipped across three distinct engagements between 2025-09 and 2026-05. Different industries, different CRMs (HubSpot, Pipedrive, custom), different team sizes. The architecture, the eval pattern, and the operator dashboard were consistent across all three; only the agent prompts, the retrieval indexes, and the integration glue were rewritten per engagement.
- Engagement A: solo founder + 2 contractors. HubSpot. Cost € 38k. 7 weeks. Beat target faithfulness in week 5; replaced ~60% of cold-outreach drafting work by week 8.
- Engagement B: 8-person services team. Pipedrive. Cost € 72k. 10 weeks. Sentinel surfaced 14 deals about to go quiet in the first month; 11 were recovered with the attached drafts.
- Engagement C: 12-person multi-region sales org. Custom CRM. Cost € 105k. 12 weeks. Per-rep voice profiles passed the 0.90 tone-match bar by week 9; Scout enrichment cut LinkedIn-tab usage by ~70%.
Reference engagement metrics above are aggregated and de-identified; specific account names are released only with client permission. The pattern is the asset; the individual numbers vary by workload.
Regional notes · Switzerland, DACH, EU.
The architecture applies broadly. A few region-specific considerations from CH and DACH engagements:
- Data residency: Swiss FADP (revised 2023) and EU GDPR/DSGVO both apply to CRM data and any AI processing of it. Retrieval indexes are hosted in CH/EU for regulated clients; the model gateway routes to jurisdictionally-appropriate inference endpoints.
- Multilingual drafting: Zürich operates DE + EN; Vaud + Geneva operate FR + EN. Scribe maintains per-language voice profiles, not a single multilingual profile (the tone drifts otherwise).
- Consent boundaries: Enrichment APIs that scrape public sources are usually fine; APIs that infer sensitive attributes (income, health, political) are blocked at the retrieval layer for any EU-resident contact.
- AI Act readiness: The EU AI Act phases in through 2026–2027. The eval harness, the observability trace, and the per-agent rubric are the primary artefacts auditors ask for. Build them from day one; retrofitting under an audit clock is expensive.
A CRM intelligence stack is not finished when the agents pass the rubric. It is finished when the operator stops opening LinkedIn and stops drafting cold emails from blank.
Where this fits in Morvion engagements.
The stack is built inside Intelligent Systems & AI Infrastructure, with the operator dashboard living in the Digital Products & Platforms practice. Most engagements start with a two-week Discovery Sprint that audits the real sales motion, identifies where deals leak today, and locks the three-agent specification before any production build.
The methodology behind the eval harness is open-sourced as The Morvion Eval Spec. For the one-paragraph definitions of the underlying terms, see the glossary entries on CRM intelligence, AI agent, multi-agent workflow, and eval-driven AI. For the long-form discussion of the three-agent shape in narrative form, see CRM intelligence is the new operating layer for small teams.
Versioning of this report.
This document is versioned. Substantial revisions (new reference architecture, new cost band, new rubric dimension) bump the major version. Minor refinements are silent. Current version: 1.0.0, published 2026-05-19.