Fine-tuning continues the training of a general-purpose model on a curated, task-specific dataset, so the model's weights shift toward the language, structure, and judgments of one domain. It is the right move when prompting alone cannot get the model to where the workflow needs it. It is the wrong first move on almost every project.

When fine-tuning actually helps.

  • Strict output format. The base model emits JSON-ish output but breaks the schema five percent of the time. Fine-tuning on a few hundred examples can bring schema adherence above ninety-nine percent.
  • Brand voice. The model reads like an LLM in a tone that does not match the operator. A few hundred labeled voice examples shift it.
  • Domain vocabulary. Heavy jargon, regulated phrasing, or proprietary nomenclature that the base model has not seen in distribution.
  • Latency or cost. A small fine-tuned model can match a much larger base model on the narrow task, at a fraction of the per-call cost.

When it does not help.

Fine-tuning does not teach the model new facts; it teaches patterns. If the missing piece is retrieval (the model does not know your data), the answer is RAG, not fine-tuning. If the missing piece is reasoning capability, the answer is a larger model or chain-of-thought prompting, not fine-tuning. Teams that reach for fine-tuning first usually find they have paid for a slower, more brittle version of the same gap.

LoRA and parameter-efficient methods.

Modern fine-tuning rarely updates every weight. Low-Rank Adaptation (LoRA) and related methods update a small set of adapter weights, so a single base model can host dozens of fine-tuned variants without the storage and serving overhead of full copies. For most production workflows in 2026, LoRA is the default.

Fine-tuning requires evals first.

A fine-tune without a fixture set is a guess. The pre-tuning baseline, the post-tuning score, and the regression check against the baseline are how the team learns whether the tune actually moved the workflow forward.

Frequently asked.

What is fine-tuning?
Fine-tuning is the practice of continuing the training of a pre-trained language model on a smaller, task-specific dataset to specialize it for one domain or output format. The result is a model that still understands general language but produces outputs closer to the target distribution.
When should we fine-tune versus prompt or do RAG?
Prompt first. RAG second when the gap is missing data. Fine-tune only when the gap is output shape, brand voice, dense domain vocabulary, or cost and latency targets a smaller model can hit if specialized. Fine-tuning does not teach new facts; it teaches patterns.
What is LoRA fine-tuning?
Low-Rank Adaptation is a parameter-efficient fine-tuning method that trains a small set of adapter weights on top of a frozen base model instead of updating all of the model's parameters. The result is a tune that takes hours instead of days, costs orders of magnitude less to train, and ships dozens of variants on one base model.
How many examples do we need to fine-tune?
Between a few hundred and a few thousand high-quality, labeled examples is enough for most narrow tasks. Quality beats quantity. The fixture set used for evaluation should be held out from the training set entirely.