Overfitting & Regularization

Why Models Memorize and How to Make Them Generalize

Difficulty
Intermediate
Duration
15-18 minutes
Prerequisites
Neural network training

What You'll Discover

Learn to spot overfitting and fix it with regularization

Spotting Overfitting

Read training vs validation loss curves to diagnose when your model memorizes instead of learns.

Bias-Variance Tradeoff

Understand why model complexity is a balancing act between underfitting and overfitting.

Dropout & Weight Decay

See how randomly disabling neurons and penalizing large weights prevent memorization.

Early Stopping

The simplest trick — stop training at the right moment before overfitting begins.

Key Concepts

Overfitting

Model memorizes training data, fails on new data

Bias-Variance Tradeoff

Balance between too simple and too complex

L2 Regularization

Penalizes large weights to keep them moderate

Dropout

Randomly disables neurons during training

Early Stopping

Stop training when validation loss rises

Generalization

The real goal — perform well on unseen data

Step
1/ 9

What is Overfitting?

Imagine a student who memorizes every answer in the textbook but can't solve new problems on the exam.

That's overfitting! The model learns the training data too well — including its noise and quirks — and fails on new data it hasn't seen.

The goal isn't to memorize. It's to generalize.

The opposite problem is underfitting — like a student who barely studied and can't even answer the textbook questions.

The sweet spot? A model that learns the real patterns without memorizing the noise.

Underfitting: Too Simple

InputOutput0.00-1.501.63-0.753.250.004.880.756.501.50Weather Data Classification
Rain (Class 1)
Sunny (Class 0)

Overfitting: Too Complex

InputOutput0.00-1.501.63-0.753.250.004.880.756.501.50Weather Data Classification
Rain (Class 1)
Sunny (Class 0)

Overfitting & Regularization — Lesson Content

See how dropout and regularization help models generalize to new data.

Overfitting is the most common problem in machine learning — your model memorizes training data instead of learning real patterns. This lesson teaches you to spot overfitting through loss curves, understand the bias-variance tradeoff, and apply regularization techniques like L2, dropout, and early stopping.

Learning Objectives

  • Recognize overfitting from loss curves
  • Understand the bias-variance tradeoff
  • Apply L2 regularization, dropout, and early stopping

Step 1: What is Overfitting?

Imagine a student who **memorizes every answer** in the textbook but can't solve new problems on the exam. That's overfitting! The model learns the training data **too well** — including its noise and quirks — and fails on new data it hasn't seen. **The goal isn't to memorize. It's to generalize.** The opposite problem is **underfitting** — like a student who barely studied and can't even answer the textbook questions. The sweet spot? A model that learns the **real patterns** without memorizing the noise.

Step 2: Spot the Problem: Loss Curves

The clearest sign of overfitting is the **gap between training and validation loss**. During training, we track two numbers: - **Training loss** (blue) — how well the model fits the data it's learning from - **Validation loss** (orange) — how well it performs on data it's **never seen** **Overfit model:** Training loss keeps dropping, but validation loss **starts going back up** around epoch 15. That's the moment the model switches from learning patterns to memorizing noise. **Good fit model:** Both curves decrease together and converge. The small gap between them is normal and healthy.

Step 3: The Bias-Variance Tradeoff

Every model makes two types of errors: **Bias** = error from oversimplified assumptions - High bias → underfitting (straight line for curved data) - The model is **too rigid** **Variance** = error from being too sensitive to training data - High variance → overfitting (wiggly line that hits every point) - The model is **too flexible** **The tradeoff:** Reducing one often increases the other. The goal is to find the **sweet spot** — enough complexity to capture patterns, but not so much that it memorizes noise.

Step 4: Regularization: Preventing Overfitting

**Regularization** is a set of techniques that constrain the model to prevent overfitting. Think of it as adding "rules" that keep the model honest. **The core idea:** Add a **penalty** to the loss function that discourages complexity. Instead of just minimizing prediction error, the model also tries to keep its weights **small and simple**. **Loss = Prediction Error + Complexity Penalty** Smaller weights → simpler model → better generalization.

Step 5: L2 Regularization: Weight Decay

**L2 regularization** adds the **sum of squared weights** to the loss. **Why does this help?** Large weights mean the model is making extreme decisions based on small input differences — a recipe for overfitting. By penalizing large weights, L2 forces the model to use **moderate, distributed weights** instead of relying heavily on a few features. Compare the weights below — without regularization, some are huge (4.2). With L2, they're all moderate. Same pattern, less memorization. L2 is so common it has a special name: **weight decay** — the weights literally "decay" toward zero each update.
L2 Loss = Original Loss + λ × Σ(wᵢ²)

Where λ controls regularization strength:
  λ = 0     → no regularization (may overfit)
  λ = 0.001 → light regularization (common default)
  λ = 0.1   → strong regularization (may underfit)

Step 6: Dropout: Random Neuron Deactivation

**Dropout** is beautifully simple: during each training step, **randomly turn off** some neurons (typically 20-50%). **Why does this work?** It prevents **co-adaptation** — when neurons learn to rely on specific other neurons instead of learning robust features independently. Think of it like a team project where members randomly call in sick: - Everyone must learn to do their part **independently** - No one can rely on one star player - The team becomes more **resilient** At test time, all neurons are active (but outputs are scaled down). The result is a model that learned **redundant, robust** representations.

Step 7: Early Stopping: Know When to Quit

The simplest regularization technique: **stop training when validation loss starts increasing**. Remember our loss curves from earlier? Validation loss improves for a while, then starts getting worse as the model overfits. Early stopping simply says: **"Stop at the best point."** **How it works in practice:** 1. Track validation loss every epoch 2. Save the model whenever validation loss improves 3. If it hasn't improved for N epochs (**patience**), stop 4. Use the saved best model It's free, simple, and almost always helps.

Step 8: Putting It All Together

**Five things to remember:** 1. **Overfitting = memorizing** instead of learning. Watch for the training/validation gap. 2. **Bias-Variance tradeoff:** Too simple (underfitting) vs too complex (overfitting). Find the sweet spot. 3. **L2 regularization** keeps weights small and moderate — the most common technique. 4. **Dropout** randomly disables neurons, forcing the network to learn robust features. 5. **Early stopping** is free and simple — just stop when validation loss starts rising. **In practice, combine them!** Most modern networks use L2 + dropout + early stopping together.

Step 9: Test Your Understanding

You've learned to spot overfitting, understand the bias-variance tradeoff, and apply regularization techniques. Let's test your understanding!

Prerequisites

  • Neural network training basics
  • Understanding of loss functions

Key Concepts

  • Bias-Variance Tradeoff
  • Dropout
  • L1/L2 Regularization
  • Early Stopping