Imagine a student who memorizes every answer in the textbook but can't solve new problems on the exam.

That's overfitting! The model learns the training data too well — including its noise and quirks — and fails on new data it hasn't seen.

The goal isn't to memorize. It's to generalize.

The opposite problem is underfitting — like a student who barely studied and can't even answer the textbook questions.

The sweet spot? A model that learns the real patterns without memorizing the noise.

Overfitting & Regularization — Lesson Content

See how dropout and regularization help models generalize to new data.

Overfitting is the most common problem in machine learning — your model memorizes training data instead of learning real patterns. This lesson teaches you to spot overfitting through loss curves, understand the bias-variance tradeoff, and apply regularization techniques like L2, dropout, and early stopping.

Learning Objectives

Recognize overfitting from loss curves
Understand the bias-variance tradeoff
Apply L2 regularization, dropout, and early stopping

Step 1: What is Overfitting?

Imagine a student who **memorizes every answer** in the textbook but can't solve new problems on the exam. That's overfitting! The model learns the training data **too well** — including its noise and quirks — and fails on new data it hasn't seen. **The goal isn't to memorize. It's to generalize.** The opposite problem is **underfitting** — like a student who barely studied and can't even answer the textbook questions. The sweet spot? A model that learns the **real patterns** without memorizing the noise.

Step 2: Spot the Problem: Loss Curves

The clearest sign of overfitting is the **gap between training and validation loss**. During training, we track two numbers: - **Training loss** (blue) — how well the model fits the data it's learning from - **Validation loss** (orange) — how well it performs on data it's **never seen** **Overfit model:** Training loss keeps dropping, but validation loss **starts going back up** around epoch 15. That's the moment the model switches from learning patterns to memorizing noise. **Good fit model:** Both curves decrease together and converge. The small gap between them is normal and healthy.

Step 3: The Bias-Variance Tradeoff

Every model makes two types of errors: **Bias** = error from oversimplified assumptions - High bias → underfitting (straight line for curved data) - The model is **too rigid** **Variance** = error from being too sensitive to training data - High variance → overfitting (wiggly line that hits every point) - The model is **too flexible** **The tradeoff:** Reducing one often increases the other. The goal is to find the **sweet spot** — enough complexity to capture patterns, but not so much that it memorizes noise.

Step 4: Regularization: Preventing Overfitting

**Regularization** is a set of techniques that constrain the model to prevent overfitting. Think of it as adding "rules" that keep the model honest. **The core idea:** Add a **penalty** to the loss function that discourages complexity. Instead of just minimizing prediction error, the model also tries to keep its weights **small and simple**. **Loss = Prediction Error + Complexity Penalty** Smaller weights → simpler model → better generalization.

Step 5: L2 Regularization: Weight Decay

**L2 regularization** adds the **sum of squared weights** to the loss. **Why does this help?** Large weights mean the model is making extreme decisions based on small input differences — a recipe for overfitting. By penalizing large weights, L2 forces the model to use **moderate, distributed weights** instead of relying heavily on a few features. Compare the weights below — without regularization, some are huge (4.2). With L2, they're all moderate. Same pattern, less memorization. L2 is so common it has a special name: **weight decay** — the weights literally "decay" toward zero each update.

L2 Loss = Original Loss + λ × Σ(wᵢ²)

Where λ controls regularization strength:
  λ = 0     → no regularization (may overfit)
  λ = 0.001 → light regularization (common default)
  λ = 0.1   → strong regularization (may underfit)

Step 6: Dropout: Random Neuron Deactivation

**Dropout** is beautifully simple: during each training step, **randomly turn off** some neurons (typically 20-50%). **Why does this work?** It prevents **co-adaptation** — when neurons learn to rely on specific other neurons instead of learning robust features independently. Think of it like a team project where members randomly call in sick: - Everyone must learn to do their part **independently** - No one can rely on one star player - The team becomes more **resilient** At test time, all neurons are active (but outputs are scaled down). The result is a model that learned **redundant, robust** representations.

Step 7: Early Stopping: Know When to Quit

The simplest regularization technique: **stop training when validation loss starts increasing**. Remember our loss curves from earlier? Validation loss improves for a while, then starts getting worse as the model overfits. Early stopping simply says: **"Stop at the best point."** **How it works in practice:** 1. Track validation loss every epoch 2. Save the model whenever validation loss improves 3. If it hasn't improved for N epochs (**patience**), stop 4. Use the saved best model It's free, simple, and almost always helps.

Step 8: Putting It All Together

**Five things to remember:** 1. **Overfitting = memorizing** instead of learning. Watch for the training/validation gap. 2. **Bias-Variance tradeoff:** Too simple (underfitting) vs too complex (overfitting). Find the sweet spot. 3. **L2 regularization** keeps weights small and moderate — the most common technique. 4. **Dropout** randomly disables neurons, forcing the network to learn robust features. 5. **Early stopping** is free and simple — just stop when validation loss starts rising. **In practice, combine them!** Most modern networks use L2 + dropout + early stopping together.

Step 9: Test Your Understanding

You've learned to spot overfitting, understand the bias-variance tradeoff, and apply regularization techniques. Let's test your understanding!

Prerequisites

Neural network training basics
Understanding of loss functions

Key Concepts

Bias-Variance Tradeoff
Dropout
L1/L2 Regularization
Early Stopping

Overfitting & Regularization

What You'll Discover

Spotting Overfitting

Bias-Variance Tradeoff

Dropout & Weight Decay

Early Stopping

Key Concepts

Overfitting

Bias-Variance Tradeoff

L2 Regularization

Dropout

Early Stopping

Generalization

Continue Learning

What is Overfitting?

Underfitting: Too Simple

Overfitting: Too Complex