Phase 5: Alignment~7 minintermediate

👩‍🏫Supervised Fine-tuning

Teaching the Model to Follow Instructions

Instruction tuning, chat formatting, and creating the foundation for helpful AI assistants.

Instruction TuningChat TemplatesLoRA/QLoRAData Quality

Teaching Instructions

After pre-training, the model can complete text but doesn't know how to be a helpful assistant. Supervised Fine-Tuning (SFT)teaches it to follow instructions by training on high-quality (instruction, response) pairs.

From Reader to Teacher

Pre-training is like reading every book in the library. SFT is like learning how to be a tutor — understanding questions, giving clear explanations, and being helpful rather than just reciting facts.
10K-100K
SFT Examples
~1-5 epochs
Training
Hours-Days
Duration
Quality > Quantity
Key Insight

Chat Format

<|system|>
You are a helpful assistant.
<|user|>
What is the capital of France?
<|assistant|>
The capital of France is Paris.

LoRA: Efficient Fine-Tuning

LoRA (Low-Rank Adaptation) trains only small adapter matrices instead of updating all model weights. This reduces memory by 100x+ and allows fine-tuning on consumer hardware.

Full Fine-Tuning

  • • Update all 70B parameters
  • • Requires 8×H100 GPUs
  • • ~500GB memory
  • • Expensive, slow

LoRA

  • • Train ~0.1% of parameters
  • • Single GPU possible
  • • ~16GB memory
  • • Fast, affordable
Key Takeaways
  • SFT teaches models to follow instructions
  • Quality of training data matters more than quantity
  • LoRA enables efficient fine-tuning with minimal resources
  • Chat templates standardize conversation format