AI Techniques DefinedTerm

Fine-tuning

Also known as: Fine-tuning, Model Fine-tuning, Finetuning

Process of adapting a pre-trained model to a specific task or domain through training on targeted data.

Updated: 2026-01-03

Definition

Fine-tuning is the process of adapting a pre-trained model (typically a foundation model) to a specific task, domain, or style through additional training on a targeted dataset.

The base model has already learned general representations during pre-training. Fine-tuning specializes these representations, enabling superior performance on specific tasks with less data and compute compared to training from scratch.

Types of Fine-tuning

Full fine-tuning: updates all model parameters. Maximum flexibility but requires more compute and risks overfitting/catastrophic forgetting.

Parameter-Efficient Fine-Tuning (PEFT): updates only a subset of parameters.

  • LoRA (Low-Rank Adaptation): adds low-rank matrices to existing layers. Reduces trainable parameters by 99%+.
  • QLoRA: LoRA on 4-bit quantized models. Enables fine-tuning of 70B models on a single consumer GPU.
  • Prefix tuning: adds virtual tokens at the beginning of the sequence.
  • Adapter layers: inserts small modules between existing layers.

Instruction tuning: fine-tuning on datasets of (instruction, response) pairs to improve instruction-following.

When to Fine-tune

Recommended for:

  • Repetitive tasks with specific output format
  • Domains with specialized terminology (legal, medical)
  • Need for consistent style/tone
  • Classification with many domain-specific classes
  • When prompting fails to achieve desired quality

Alternatives to consider:

  • Prompt engineering: often sufficient, zero training costs
  • RAG: for knowledge retrieval without modifying the model
  • Few-shot prompting: examples in the prompt instead of training

Practical Considerations

Dataset: quality beats quantity. 500-1000 high-quality examples often outperform 10K noisy examples. Typical format: (input, expected output) pairs.

Costs: vary enormously. Fine-tuning GPT-4o via API costs ~$25/million training tokens. Self-hosting with LoRA on open models requires GPU (A100: ~$2/hour).

Evaluation: define specific metrics before fine-tuning. Compare against baseline (base model + prompting) to verify added value.

Risks: catastrophic forgetting (loses general capabilities), overfitting (memorizes instead of generalizing), data leakage in test set.

Common Misconceptions

”Fine-tuning is always better than prompting”

No. For many tasks, few-shot prompting on frontier models performs comparably or better, without training costs and with greater flexibility.

”You need a huge dataset”

With PEFT and modern models, hundreds of quality examples can suffice. Focus is on quality and diversity of examples, not quantity.

”Fine-tuning = the model learns new information”

Fine-tuning modifies behaviors and style, but is inefficient for injecting new factual knowledge. RAG is more appropriate for that.

  • LLM: models typically subject to fine-tuning
  • RLHF: alignment technique applied after fine-tuning
  • Foundation Model: starting point for fine-tuning
  • RAG: alternative/complement to fine-tuning

Sources