Hyperparameter Tuning
Finding the Best Settings for Optimal Performance
What You'll Discover
Learn how to choose the right settings for training neural networks
Learning Rate Impact
See how too-high or too-low learning rates dramatically change training outcomes.
Network Sizing
Understand the trade-off between underfitting with too few neurons and overfitting with too many.
Search Strategies
Compare grid search and random search to find optimal hyperparameter combinations.
Diagnostic Tools
Read learning curves to diagnose problems and guide your tuning decisions.
Key Concepts
Learning Rate
Controls how big of a step the network takes during weight updates
Network Architecture
The number and size of layers determines model capacity
Grid Search
Systematically try every combination from a predefined set
Random Search
Sample random combinations for better coverage with fewer trials
Learning Curves
Plot loss over epochs to diagnose underfitting and overfitting
Early Stopping
Stop training when validation loss stops improving to prevent overfitting
Continue Learning
Explore related topics to deepen your understanding
What Are Hyperparameters?
When you train a neural network, it learns parameters like weights and biases automatically. But some settings must be chosen by you before training begins — these are called hyperparameters.
Think of it like baking a cake:
- •Parameters (learned): How much of each ingredient ends up in the batter — the network figures this out through training.
- •Hyperparameters (chosen by you): Oven temperature, baking time, pan size — you decide these before you start baking.
Bad hyperparameters can ruin a perfectly good model, just like the wrong oven temperature can burn a cake. The difference between a 60% accurate model and a 95% accurate model is often not the architecture — it's the hyperparameters.
Here are the most important hyperparameters you'll encounter:
- •Learning Rate — How big of a step to take during gradient descent
- •Hidden Layer Size — How many neurons in each hidden layer
- •Number of Epochs — How many times to iterate over the training data
- •Batch Size — How many samples to process before updating weights
Throughout this lesson, we'll use our familiar example: predicting whether a student passes an exam based on hours studied and hours slept.
Parameters vs Hyperparameters
| Category | Example | Who Decides? | When? |
|---|---|---|---|
| Parameter | Weight w₁ = 0.35 | The network (via backprop) | During training |
| Parameter | Bias b₁ = 0.05 | The network (via backprop) | During training |
| Hyperparameter | Learning rate = 0.5 | You (the engineer) | Before training |
| Hyperparameter | Hidden neurons = 8 | You (the engineer) | Before training |
| Hyperparameter | Epochs = 50 | You (the engineer) | Before training |
| Hyperparameter | Batch size = 32 | You (the engineer) | Before training |
The Goldilocks Problem: Every Hyperparameter Has a Sweet Spot
| Hyperparameter | Too Low | Too High | Effect |
|---|---|---|---|
| Learning Rate | Learns too slowly | Diverges / explodes | Speed vs stability |
| Hidden Neurons | Can't learn patterns (underfit) | Memorizes noise (overfit) | Capacity |
| Epochs | Not enough learning | Overfitting to training data | Training duration |
| Batch Size | Noisy updates, slow | Less generalization | Update quality |