Gradient Descent Optimization

How Neural Networks Find Better Answers

Difficulty
Beginner
Duration
15-20 minutes
Prerequisites
Loss functions

What You'll Discover

Watch optimization happen in real-time on a loss landscape

The Gradient

How networks figure out which direction is "downhill" on the loss landscape.

Learning Rate

Why step size is the most important training decision — too small is slow, too large diverges.

Convergence

Watch steps naturally shrink as the optimizer settles at the minimum.

The Full Picture

How this simple algorithm scales to train billion-parameter neural networks.

Key Concepts

Loss Landscape

A surface where height = how wrong the network is

Gradient

Points uphill — we move the opposite way

Learning Rate

Controls how big each step is

Convergence

Steps shrink as we approach the minimum

Local vs Global Minima

Valleys that aren't the deepest can trap us

Step Size Tradeoff

Speed vs stability — the core training dilemma

Step
1/ 7

What is Gradient Descent?

Imagine you're blindfolded on a hilly landscape and need to find the lowest valley.

Your strategy: Feel the slope under your feet and take a step downhill. Repeat.

That's gradient descent! It's how neural networks learn — they start with random guesses, then step by step move toward better answers.

The colored surface below is a "loss landscape." The dark blue valley is where loss is lowest (best predictions). Our goal is to reach it.

Loss Landscape

Parameter 1 (x)Parameter 2 (y)Optimal Point (1, 2)
Current Position
Optimal Point
Color intensity represents loss value (blue = low, red = high)

The Big Idea

ConceptMeaning
LossHow wrong the network is (lower = better)
GradientThe direction of steepest uphill
StepMove opposite to gradient (go downhill)
Learning RateHow big each step is

Gradient Descent Optimization — Lesson Content

See how gradient descent navigates loss landscapes to find optimal solutions.

Gradient descent is the optimization algorithm that powers all neural network training. Watch it navigate a loss landscape step by step, and build intuition for how learning rate, gradients, and convergence work together.

Learning Objectives

  • Understand what gradients measure
  • See how learning rate affects training
  • Watch convergence happen step by step

Step 1: What is Gradient Descent?

Imagine you're blindfolded on a hilly landscape and need to find the **lowest valley**. **Your strategy:** Feel the slope under your feet and take a step downhill. Repeat. That's gradient descent! It's how neural networks learn — they start with random guesses, then **step by step** move toward better answers. The colored surface below is a "loss landscape." The **dark blue valley** is where loss is lowest (best predictions). Our goal is to reach it.

Step 2: Our Starting Point

We begin at position **(0, 0)** — a random starting guess. Our loss here is **9.0** — that's pretty far from optimal. The **white dot** on the landscape shows where we are. Notice we're on a steep slope, far from the dark blue valley. The network's predictions are very wrong right now. **Next:** We need to figure out which direction is "downhill."

Step 3: The Gradient Points Uphill

The **gradient** tells us the direction of steepest *increase*. At our position, the gradient is **(-2.0, -8.0)**. **The trick:** We move in the **opposite** direction! If the gradient says "go right to go uphill," we go left to go downhill. Think of it like water — it always flows downhill. We're following the water.

Step 4: Learning Rate: Step Size Matters

The **learning rate** controls how big each step is. **Too small (0.05):** Tiny steps. Safe but painfully slow — you might never reach the valley. **Just right (0.1):** Steady progress. Reaches the valley efficiently. **Too large (0.5):** Huge leaps. You might overshoot the valley and bounce around forever! Finding a good learning rate is one of the most important decisions in training.

Step 5: First Steps: Big Improvements

Let's watch gradient descent in action with learning rate **0.1**. After just **3 steps**, we've moved from loss **9.0** down to **0.64**! Notice how the **first few steps are large** — the slope is steep, so the gradient is big, and we make fast progress. The white line traces our path across the landscape. We're heading straight for the valley!

Step 6: Convergence: Settling at the Minimum

After **10 steps**, we're very close to the minimum! **Loss dropped from 9.0 to 0.0118** — a 99.9% reduction. Notice how steps get **smaller near the bottom**. This happens naturally because the gradient shrinks as the slope flattens out. It's like a ball rolling into a valley — it slows down as it reaches the bottom. This automatic slowdown helps us **settle precisely** at the minimum without overshooting.

Step 7: You Got It!

**Four things to remember:** 1. **Gradient = direction of steepest uphill.** We go the opposite way. 2. **Learning rate = step size.** Too small is slow, too large overshoots. 3. **Steps shrink near the minimum.** The gradient naturally gets smaller on flatter ground. 4. **This is how ALL neural networks learn.** The same process, just with millions of parameters instead of two. **Next up:** See backpropagation — how gradients flow through layers to update every weight!

Prerequisites

  • Understanding of loss functions

Key Concepts

  • Gradients
  • Learning Rate
  • Optimization
  • Convergence