Overfitting & Regularization — Lesson Content
See how dropout and regularization help models generalize to new data.
Overfitting is the most common problem in machine learning — your model memorizes training data instead of learning real patterns. This lesson teaches you to spot overfitting through loss curves, understand the bias-variance tradeoff, and apply regularization techniques like L2, dropout, and early stopping.
Learning Objectives
- Recognize overfitting from loss curves
- Understand the bias-variance tradeoff
- Apply L2 regularization, dropout, and early stopping
Step 1: What is Overfitting?
Imagine a student who **memorizes every answer** in the textbook but can't solve new problems on the exam.
That's overfitting! The model learns the training data **too well** — including its noise and quirks — and fails on new data it hasn't seen.
**The goal isn't to memorize. It's to generalize.**
The opposite problem is **underfitting** — like a student who barely studied and can't even answer the textbook questions.
The sweet spot? A model that learns the **real patterns** without memorizing the noise.
Step 2: Spot the Problem: Loss Curves
The clearest sign of overfitting is the **gap between training and validation loss**.
During training, we track two numbers:
- **Training loss** (blue) — how well the model fits the data it's learning from
- **Validation loss** (orange) — how well it performs on data it's **never seen**
**Overfit model:** Training loss keeps dropping, but validation loss **starts going back up** around epoch 15. That's the moment the model switches from learning patterns to memorizing noise.
**Good fit model:** Both curves decrease together and converge. The small gap between them is normal and healthy.
Step 3: The Bias-Variance Tradeoff
Every model makes two types of errors:
**Bias** = error from oversimplified assumptions
- High bias → underfitting (straight line for curved data)
- The model is **too rigid**
**Variance** = error from being too sensitive to training data
- High variance → overfitting (wiggly line that hits every point)
- The model is **too flexible**
**The tradeoff:** Reducing one often increases the other.
The goal is to find the **sweet spot** — enough complexity to capture patterns, but not so much that it memorizes noise.
Step 4: Regularization: Preventing Overfitting
**Regularization** is a set of techniques that constrain the model to prevent overfitting. Think of it as adding "rules" that keep the model honest.
**The core idea:** Add a **penalty** to the loss function that discourages complexity.
Instead of just minimizing prediction error, the model also tries to keep its weights **small and simple**.
**Loss = Prediction Error + Complexity Penalty**
Smaller weights → simpler model → better generalization.
Step 5: L2 Regularization: Weight Decay
**L2 regularization** adds the **sum of squared weights** to the loss.
**Why does this help?** Large weights mean the model is making extreme decisions based on small input differences — a recipe for overfitting.
By penalizing large weights, L2 forces the model to use **moderate, distributed weights** instead of relying heavily on a few features.
Compare the weights below — without regularization, some are huge (4.2). With L2, they're all moderate. Same pattern, less memorization.
L2 is so common it has a special name: **weight decay** — the weights literally "decay" toward zero each update.
L2 Loss = Original Loss + λ × Σ(wᵢ²)
Where λ controls regularization strength:
λ = 0 → no regularization (may overfit)
λ = 0.001 → light regularization (common default)
λ = 0.1 → strong regularization (may underfit)
Step 6: Dropout: Random Neuron Deactivation
**Dropout** is beautifully simple: during each training step, **randomly turn off** some neurons (typically 20-50%).
**Why does this work?** It prevents **co-adaptation** — when neurons learn to rely on specific other neurons instead of learning robust features independently.
Think of it like a team project where members randomly call in sick:
- Everyone must learn to do their part **independently**
- No one can rely on one star player
- The team becomes more **resilient**
At test time, all neurons are active (but outputs are scaled down). The result is a model that learned **redundant, robust** representations.
Step 7: Early Stopping: Know When to Quit
The simplest regularization technique: **stop training when validation loss starts increasing**.
Remember our loss curves from earlier? Validation loss improves for a while, then starts getting worse as the model overfits. Early stopping simply says: **"Stop at the best point."**
**How it works in practice:**
1. Track validation loss every epoch
2. Save the model whenever validation loss improves
3. If it hasn't improved for N epochs (**patience**), stop
4. Use the saved best model
It's free, simple, and almost always helps.
Step 8: Putting It All Together
**Five things to remember:**
1. **Overfitting = memorizing** instead of learning. Watch for the training/validation gap.
2. **Bias-Variance tradeoff:** Too simple (underfitting) vs too complex (overfitting). Find the sweet spot.
3. **L2 regularization** keeps weights small and moderate — the most common technique.
4. **Dropout** randomly disables neurons, forcing the network to learn robust features.
5. **Early stopping** is free and simple — just stop when validation loss starts rising.
**In practice, combine them!** Most modern networks use L2 + dropout + early stopping together.
Step 9: Test Your Understanding
You've learned to spot overfitting, understand the bias-variance tradeoff, and apply regularization techniques. Let's test your understanding!
Prerequisites
- Neural network training basics
- Understanding of loss functions
Key Concepts
- Bias-Variance Tradeoff
- Dropout
- L1/L2 Regularization
- Early Stopping