What is retrieval-augmented generation (RAG)?

Retrieval-augmented generation is the AI pattern that grounds an LLM's response in a company's own data by retrieving relevant context at query time. The model is asked to answer from the retrieved documents, cite sources, and refuse when the answer isn't there.

When does a business need RAG?

When AI workflows need to answer from the company's own data (policies, contracts, product specs, historical decisions, customer records) rather than from the model's training. Most production business AI systems use RAG as the primary grounding pattern.

What does Morvion build into a RAG pipeline?

Source ingestion with chunking strategy tuned per document type, hybrid vector and keyword indexing, retrieval with reranking, generation with citation and refusal instructions, and an evaluation harness measuring retrieval recall, generation faithfulness, and refusal rate. Production RAG, not demo RAG.

Retrieval-augmented generation (RAG) · Morvion Glossary

Retrieval-augmented generation (RAG) is the AI pattern that grounds an LLM's response in a company's own data by retrieving relevant context at query time. Instead of hoping the model knows the company's policies, products, or history, the system retrieves the right documents first and asks the model to answer from them.

The four pieces of a working RAG pipeline.

Ingestion. Source documents flow into the system, chunked into retrievable units. Chunking strategy matters as much as the model choice.
Indexing. Chunks are embedded and stored in a vector index, often alongside a keyword index for hybrid search.
Retrieval. At query time, the system pulls the most relevant chunks via vector search, keyword match, or both. Reranking is the unglamorous step that makes RAG actually work.
Generation. The retrieved context is passed to the LLM with instructions to answer from the context, cite sources, and refuse if the answer isn't there.

What RAG is not.

It is not fine-tuning. Fine-tuning changes the model's weights; RAG changes the model's context window. Fine- tuning is heavy, expensive, and rarely the right tool for private knowledge. RAG is light, swappable, and the default pattern for grounded-in-business-data AI.

“Fine-tuning is a surgery. RAG is a conversation.”

Where RAG quietly fails.

Chunk size too small (the model gets fragments, not context), chunk size too large (retrieval surfaces irrelevant material), no reranking (the top-k by vector similarity isn't actually the most relevant), no eval (nobody notices the system regresses), and no refusal path (the model fills the gap with hallucination). Morvion's RAG engagements ship with evals on retrieval recall, generation faithfulness, and refusal rate from day one.

Frequently asked.

What is retrieval-augmented generation (RAG)?: Retrieval-augmented generation is the AI pattern that grounds an LLM's response in a company's own data by retrieving relevant context at query time. The model is asked to answer from the retrieved documents, cite sources, and refuse when the answer isn't there.
How is RAG different from fine-tuning?: Fine-tuning changes the model's weights. RAG changes the model's context window. Fine-tuning is heavy, expensive, and rarely the right tool for private knowledge that changes often. RAG is light, swappable, and the default pattern for grounded-in-business-data AI.
When does a business need RAG?: When AI workflows need to answer from the company's own data (policies, contracts, product specs, historical decisions, customer records) rather than from the model's training. Most production business AI systems use RAG as the primary grounding pattern.
What does Morvion build into a RAG pipeline?: Source ingestion with chunking strategy tuned per document type, hybrid vector and keyword indexing, retrieval with reranking, generation with citation and refusal instructions, and an evaluation harness measuring retrieval recall, generation faithfulness, and refusal rate. Production RAG, not demo RAG.

Retrieval-augmented generation (RAG)

The four pieces of a working RAG pipeline.

What RAG is not.

Where RAG quietly fails.

Frequently asked.

Intelligent Systems & AI Infrastructure

Keep reading the glossary.

AI infrastructure

CRM intelligence

Immersive website

AI agent

Business intelligence dashboard

Client portal

Discovery sprint

Digital operating layer

Document intelligence

Eval-driven AI

Hospitality website

Marketplace platform

Multi-agent workflow

Real-time dashboard

Prompt engineering

Vector database

AI observability

Embedding model

Fine-tuning

Vector search

Semantic search

Hallucination

Chain-of-thought

Function calling

Model distillation

Safety rails

Eval harness

Regression gate

Model Context Protocol (MCP)

Structured output

Agent tool use

Prompt injection

Agentic search

Observability traces

LLM guardrails

Agent handoff

Vector index

Token budget

Retrieval rerank

Embedding space

Semantic cache

Context window

Faithfulness

Cross-encoder

Model router

AI cost control

Agent memory

Structured extraction

AI evaluation framework

Retrieval quality

AI guardrail policy

Eval fixture

Eval rubric

AI incident

Agent orchestration

Eval versioning

Model fallback

Fine-grained routing

AI policy version control