Back to AI Fundamentals

Hyperparameter Tuning

Finding the Best Settings for Optimal Performance

Difficulty
Intermediate
Duration
15-18 minutes
Prerequisites
Neural network training, Overfitting

What You'll Discover

Learn how to choose the right settings for training neural networks

Learning Rate Impact

See how too-high or too-low learning rates dramatically change training outcomes.

Network Sizing

Understand the trade-off between underfitting with too few neurons and overfitting with too many.

Search Strategies

Compare grid search and random search to find optimal hyperparameter combinations.

Diagnostic Tools

Read learning curves to diagnose problems and guide your tuning decisions.

Key Concepts

Learning Rate

Controls how big of a step the network takes during weight updates

Network Architecture

The number and size of layers determines model capacity

Grid Search

Systematically try every combination from a predefined set

Random Search

Sample random combinations for better coverage with fewer trials

Learning Curves

Plot loss over epochs to diagnose underfitting and overfitting

Early Stopping

Stop training when validation loss stops improving to prevent overfitting

Step
1/ 8

What Are Hyperparameters?

When you train a neural network, it learns parameters like weights and biases automatically. But some settings must be chosen by you before training begins — these are called hyperparameters.

Think of it like baking a cake:

  • Parameters (learned): How much of each ingredient ends up in the batter — the network figures this out through training.
  • Hyperparameters (chosen by you): Oven temperature, baking time, pan size — you decide these before you start baking.

Bad hyperparameters can ruin a perfectly good model, just like the wrong oven temperature can burn a cake. The difference between a 60% accurate model and a 95% accurate model is often not the architecture — it's the hyperparameters.

Here are the most important hyperparameters you'll encounter:

  • Learning Rate — How big of a step to take during gradient descent
  • Hidden Layer Size — How many neurons in each hidden layer
  • Number of Epochs — How many times to iterate over the training data
  • Batch Size — How many samples to process before updating weights

Throughout this lesson, we'll use our familiar example: predicting whether a student passes an exam based on hours studied and hours slept.

Parameters vs Hyperparameters

CategoryExampleWho Decides?When?
ParameterWeight w₁ = 0.35The network (via backprop)During training
ParameterBias b₁ = 0.05The network (via backprop)During training
HyperparameterLearning rate = 0.5You (the engineer)Before training
HyperparameterHidden neurons = 8You (the engineer)Before training
HyperparameterEpochs = 50You (the engineer)Before training
HyperparameterBatch size = 32You (the engineer)Before training

The Goldilocks Problem: Every Hyperparameter Has a Sweet Spot

HyperparameterToo LowToo HighEffect
Learning RateLearns too slowlyDiverges / explodesSpeed vs stability
Hidden NeuronsCan't learn patterns (underfit)Memorizes noise (overfit)Capacity
EpochsNot enough learningOverfitting to training dataTraining duration
Batch SizeNoisy updates, slowLess generalizationUpdate quality