Gradient Descent Optimization — Lesson Content
See how gradient descent navigates loss landscapes to find optimal solutions.
Gradient descent is the optimization algorithm that powers all neural network training. Watch it navigate a loss landscape step by step, and build intuition for how learning rate, gradients, and convergence work together.
Learning Objectives
- Understand what gradients measure
- See how learning rate affects training
- Watch convergence happen step by step
Step 1: What is Gradient Descent?
Imagine you're blindfolded on a hilly landscape and need to find the **lowest valley**.
**Your strategy:** Feel the slope under your feet and take a step downhill. Repeat.
That's gradient descent! It's how neural networks learn — they start with random guesses, then **step by step** move toward better answers.
The colored surface below is a "loss landscape." The **dark blue valley** is where loss is lowest (best predictions). Our goal is to reach it.
Step 2: Our Starting Point
We begin at position **(0, 0)** — a random starting guess.
Our loss here is **9.0** — that's pretty far from optimal.
The **white dot** on the landscape shows where we are. Notice we're on a steep slope, far from the dark blue valley. The network's predictions are very wrong right now.
**Next:** We need to figure out which direction is "downhill."
Step 3: The Gradient Points Uphill
The **gradient** tells us the direction of steepest *increase*.
At our position, the gradient is **(-2.0, -8.0)**.
**The trick:** We move in the **opposite** direction! If the gradient says "go right to go uphill," we go left to go downhill.
Think of it like water — it always flows downhill. We're following the water.
Step 4: Learning Rate: Step Size Matters
The **learning rate** controls how big each step is.
**Too small (0.05):** Tiny steps. Safe but painfully slow — you might never reach the valley.
**Just right (0.1):** Steady progress. Reaches the valley efficiently.
**Too large (0.5):** Huge leaps. You might overshoot the valley and bounce around forever!
Finding a good learning rate is one of the most important decisions in training.
Step 5: First Steps: Big Improvements
Let's watch gradient descent in action with learning rate **0.1**.
After just **3 steps**, we've moved from loss **9.0** down to **0.64**!
Notice how the **first few steps are large** — the slope is steep, so the gradient is big, and we make fast progress.
The white line traces our path across the landscape. We're heading straight for the valley!
Step 6: Convergence: Settling at the Minimum
After **10 steps**, we're very close to the minimum!
**Loss dropped from 9.0 to 0.0118** — a 99.9% reduction.
Notice how steps get **smaller near the bottom**. This happens naturally because the gradient shrinks as the slope flattens out. It's like a ball rolling into a valley — it slows down as it reaches the bottom.
This automatic slowdown helps us **settle precisely** at the minimum without overshooting.
Step 7: You Got It!
**Four things to remember:**
1. **Gradient = direction of steepest uphill.** We go the opposite way.
2. **Learning rate = step size.** Too small is slow, too large overshoots.
3. **Steps shrink near the minimum.** The gradient naturally gets smaller on flatter ground.
4. **This is how ALL neural networks learn.** The same process, just with millions of parameters instead of two.
**Next up:** See backpropagation — how gradients flow through layers to update every weight!
Prerequisites
- Understanding of loss functions
Key Concepts
- Gradients
- Learning Rate
- Optimization
- Convergence