Retrieval-augmented generation (RAG) is the AI pattern that grounds an LLM's response in a company's own data by retrieving relevant context at query time. Instead of hoping the model knows the company's policies, products, or history, the system retrieves the right documents first and asks the model to answer from them.
The four pieces of a working RAG pipeline.
- Ingestion. Source documents flow into the system, chunked into retrievable units. Chunking strategy matters as much as the model choice.
- Indexing. Chunks are embedded and stored in a vector index, often alongside a keyword index for hybrid search.
- Retrieval. At query time, the system pulls the most relevant chunks via vector search, keyword match, or both. Reranking is the unglamorous step that makes RAG actually work.
- Generation. The retrieved context is passed to the LLM with instructions to answer from the context, cite sources, and refuse if the answer isn't there.
What RAG is not.
It is not fine-tuning. Fine-tuning changes the model's weights; RAG changes the model's context window. Fine- tuning is heavy, expensive, and rarely the right tool for private knowledge. RAG is light, swappable, and the default pattern for grounded-in-business-data AI.
“Fine-tuning is a surgery. RAG is a conversation.”
Where RAG quietly fails.
Chunk size too small (the model gets fragments, not context), chunk size too large (retrieval surfaces irrelevant material), no reranking (the top-k by vector similarity isn't actually the most relevant), no eval (nobody notices the system regresses), and no refusal path (the model fills the gap with hallucination). Morvion's RAG engagements ship with evals on retrieval recall, generation faithfulness, and refusal rate from day one.
Frequently asked.
- What is retrieval-augmented generation (RAG)?
- Retrieval-augmented generation is the AI pattern that grounds an LLM's response in a company's own data by retrieving relevant context at query time. The model is asked to answer from the retrieved documents, cite sources, and refuse when the answer isn't there.
- How is RAG different from fine-tuning?
- Fine-tuning changes the model's weights. RAG changes the model's context window. Fine-tuning is heavy, expensive, and rarely the right tool for private knowledge that changes often. RAG is light, swappable, and the default pattern for grounded-in-business-data AI.
- When does a business need RAG?
- When AI workflows need to answer from the company's own data (policies, contracts, product specs, historical decisions, customer records) rather than from the model's training. Most production business AI systems use RAG as the primary grounding pattern.
- What does Morvion build into a RAG pipeline?
- Source ingestion with chunking strategy tuned per document type, hybrid vector and keyword indexing, retrieval with reranking, generation with citation and refusal instructions, and an evaluation harness measuring retrieval recall, generation faithfulness, and refusal rate. Production RAG, not demo RAG.