Hyperparameter Tuning

Finding the Best Settings for Optimal Performance

Difficulty

Intermediate

Duration

15-18 minutes

Prerequisites

Neural network training, Overfitting

What You'll Discover

Learn how to choose the right settings for training neural networks

Learning Rate Impact

See how too-high or too-low learning rates dramatically change training outcomes.

Network Sizing

Understand the trade-off between underfitting with too few neurons and overfitting with too many.

Search Strategies

Compare grid search and random search to find optimal hyperparameter combinations.

Diagnostic Tools

Read learning curves to diagnose problems and guide your tuning decisions.

Key Concepts

Learning Rate

Controls how big of a step the network takes during weight updates

Network Architecture

The number and size of layers determines model capacity

Grid Search

Systematically try every combination from a predefined set

Random Search

Sample random combinations for better coverage with fewer trials

Learning Curves

Plot loss over epochs to diagnose underfitting and overfitting

Early Stopping

Stop training when validation loss stops improving to prevent overfitting

Continue Learning

Explore related topics to deepen your understanding

Overfitting & Regularization

Learn techniques to prevent your model from memorizing training data

Gradient Descent

Review the optimization algorithm that learning rate controls

Step

1/ 8

What Are Hyperparameters?

When you train a neural network, it learns parameters like weights and biases automatically. But some settings must be chosen by you before training begins — these are called hyperparameters.

Think of it like baking a cake:

•Parameters (learned): How much of each ingredient ends up in the batter — the network figures this out through training.
•Hyperparameters (chosen by you): Oven temperature, baking time, pan size — you decide these before you start baking.

Bad hyperparameters can ruin a perfectly good model, just like the wrong oven temperature can burn a cake. The difference between a 60% accurate model and a 95% accurate model is often not the architecture — it's the hyperparameters.

Here are the most important hyperparameters you'll encounter:

•Learning Rate — How big of a step to take during gradient descent
•Hidden Layer Size — How many neurons in each hidden layer
•Number of Epochs — How many times to iterate over the training data
•Batch Size — How many samples to process before updating weights

Throughout this lesson, we'll use our familiar example: predicting whether a student passes an exam based on hours studied and hours slept.

Parameters vs Hyperparameters

Category	Example	Who Decides?	When?
Parameter	Weight w₁ = 0.35	The network (via backprop)	During training
Parameter	Bias b₁ = 0.05	The network (via backprop)	During training
Hyperparameter	Learning rate = 0.5	You (the engineer)	Before training
Hyperparameter	Hidden neurons = 8	You (the engineer)	Before training
Hyperparameter	Epochs = 50	You (the engineer)	Before training
Hyperparameter	Batch size = 32	You (the engineer)	Before training

The Goldilocks Problem: Every Hyperparameter Has a Sweet Spot

Hyperparameter	Too Low	Too High	Effect
Learning Rate	Learns too slowly	Diverges / explodes	Speed vs stability
Hidden Neurons	Can't learn patterns (underfit)	Memorizes noise (overfit)	Capacity
Epochs	Not enough learning	Overfitting to training data	Training duration
Batch Size	Noisy updates, slow	Less generalization	Update quality