What is a regression gate in AI deployment?

A regression gate is an automated CI check that runs the eval harness against the candidate release, computes per-metric mean scores, and fails the PR if any metric drops past its tolerance versus the saved baseline. It is the only artefact that reliably prevents silent quality regressions when prompts, models, or retrieval logic change.

How do I choose the tolerance for a regression gate?

Run the eval harness five times against unchanged code and measure the standard deviation of each metric. Set the tolerance at no less than twice that. Structural metrics (schema validity, banned tokens) typically take a 0.00 tolerance; LLM-graded metrics typically need 0.05 to 0.10 because grader variance is real.

When should I update the baseline?

Only after a release has been reviewed, intended for production, and the new score is explainable. Baseline updates are decisions, not chores. Auto-updating the baseline on every green run defeats the purpose by erasing the historical anchor against which regressions are measured.

What happens when the gate fires?

Read the per-fixture report, identify the failing fixtures, classify the cause as bug, traffic shift, or acceptable trade-off, then either fix the bug, refresh the fixture set with a PR-level note, or widen the tolerance with a PR-level note. Never bypass the gate without writing down why.

Regression gate · Morvion Glossary

A regression gate is the artefact that prevents Thursday-night rollbacks from becoming routine. It compares every release's per-metric eval scores against the last-blessed baseline and fails the PR check when any metric drops past its declared tolerance.

Anatomy of a gate.

Baseline. Per-metric mean scores from the last release that passed review. Stored alongside the eval config.
Tolerance. Per-metric maximum acceptable drop. Tight (0.00) for structural metrics, looser (0.05–0.10) for LLM-graded metrics that carry more score variance.
Comparator. The CI step that runs the eval harness, computes current per-metric means, subtracts from baseline, and exits non-zero when any drop exceeds the tolerance.

How to set a tolerance.

Run the eval harness five times against the same code, measure the standard deviation of each metric, set the tolerance at no less than twice that. Below the noise floor, the gate fires on randomness; above 2× it, the gate catches real drops without false positives.

Bypass is the failure mode.

The first time a team merges a regression with "we'll fix it next sprint", the gate becomes theatre. The discipline is to either fix the regression or update the baseline with an explicit PR-level explanation of why the new score is acceptable. Both actions are reversible. Bypass without explanation is not.

Frequently asked.

What is a regression gate in AI deployment?: A regression gate is an automated CI check that runs the eval harness against the candidate release, computes per-metric mean scores, and fails the PR if any metric drops past its tolerance versus the saved baseline. It is the only artefact that reliably prevents silent quality regressions when prompts, models, or retrieval logic change.
How do I choose the tolerance for a regression gate?: Run the eval harness five times against unchanged code and measure the standard deviation of each metric. Set the tolerance at no less than twice that. Structural metrics (schema validity, banned tokens) typically take a 0.00 tolerance; LLM-graded metrics typically need 0.05 to 0.10 because grader variance is real.
When should I update the baseline?: Only after a release has been reviewed, intended for production, and the new score is explainable. Baseline updates are decisions, not chores. Auto-updating the baseline on every green run defeats the purpose by erasing the historical anchor against which regressions are measured.
What happens when the gate fires?: Read the per-fixture report, identify the failing fixtures, classify the cause as bug, traffic shift, or acceptable trade-off, then either fix the bug, refresh the fixture set with a PR-level note, or widen the tolerance with a PR-level note. Never bypass the gate without writing down why.

Regression gate

Anatomy of a gate.

How to set a tolerance.

Bypass is the failure mode.

Frequently asked.

Intelligent Systems & AI Infrastructure

Keep reading the glossary.

AI infrastructure

CRM intelligence

Immersive website

AI agent

Business intelligence dashboard

Client portal

Discovery sprint

Digital operating layer

Document intelligence

Eval-driven AI

Hospitality website

Marketplace platform

Multi-agent workflow

Real-time dashboard

Retrieval-augmented generation (RAG)

Prompt engineering

Vector database

AI observability

Embedding model

Fine-tuning

Vector search

Semantic search

Hallucination

Chain-of-thought

Function calling

Model distillation

Safety rails

Eval harness

Model Context Protocol (MCP)

Structured output

Agent tool use

Prompt injection

Agentic search

Observability traces

LLM guardrails

Agent handoff

Vector index

Token budget

Retrieval rerank

Embedding space

Semantic cache

Context window

Faithfulness

Cross-encoder

Model router

AI cost control

Agent memory

Structured extraction

AI evaluation framework

Retrieval quality

AI guardrail policy

Eval fixture

Eval rubric

AI incident

Agent orchestration

Eval versioning

Model fallback

Fine-grained routing

AI policy version control