Loss Functions — Lesson Content
Learn how neural networks measure "wrongness" through interactive exploration.
Loss functions tell neural networks how wrong their predictions are. This interactive lesson lets you explore MSE, MAE, and Cross-Entropy by dragging sliders and seeing the results in real-time.
No math required - just play with the visualizations and build your intuition!
Learning Objectives
- Understand what loss functions measure
- See the difference between MSE and MAE
- Know when to use each type
Step 1: What is a Loss Function?
A loss function is like a **score card** that tells your neural network how wrong it is.
**Think of it like playing darts:**
- Hit the bullseye → Loss = 0 (perfect!)
- Miss by a little → Small loss
- Miss by a lot → Big loss
The network's job is to make the loss as **small as possible**.
Step 2: Mean Squared Error (MSE): The Harsh Teacher
MSE squares each error, so big mistakes get punished way more than small ones.
**Why square?** It makes outliers stand out. If you're off by 1, it costs 1. But if you're off by 5, it costs 25!
**Real example - Predicting house prices:**
Your model made 3 predictions:
- House 1: Predicted $200k, Actually $210k → Error = 10 → Error² = 100
- House 2: Predicted $150k, Actually $155k → Error = 5 → Error² = 25
- House 3: Predicted $300k, Actually $250k → Error = 50 → Error² = 2,500
MSE = (100 + 25 + 2,500) / 3 = **875**
See how House 3's big error dominates the loss? That's the "harsh teacher" in action.
MSE = (1/n) × Σ(predicted - actual)²
Example:
House 1: (200 - 210)² = 100
House 2: (150 - 155)² = 25
House 3: (300 - 250)² = 2,500
MSE = (100 + 25 + 2,500) ÷ 3 = 875
Step 3: Mean Absolute Error (MAE): The Fair Teacher
MAE treats all errors the same way—proportionally. No surprises, no wild punishments for big errors.
**Real example - Same 3 houses:**
- House 1: Error = 10 → Cost = 10
- House 2: Error = 5 → Cost = 5
- House 3: Error = 50 → Cost = 50
MAE = (10 + 5 + 50) / 3 = **21.67**
Notice House 3's error is still the biggest, but it doesn't cause an explosion. The loss grows linearly.
**When to use MAE:** When you have outliers in your data (like a few mansion sales when predicting house prices). MSE would freak out about them; MAE stays calm.
MAE = (1/n) × Σ|predicted - actual|
Example (same houses):
House 1: |200 - 210| = 10
House 2: |150 - 155| = 5
House 3: |300 - 250| = 50
MAE = (10 + 5 + 50) ÷ 3 = 21.67
Step 4: MSE vs MAE: Side-by-Side Comparison
Let's see how MSE and MAE react to the same errors:
**Small error (prediction = 6, target = 5):**
- MSE: (6 - 5)² = 1
- MAE: |6 - 5| = 1
→ Same penalty!
**Medium error (prediction = 8, target = 5):**
- MSE: (8 - 5)² = 9
- MAE: |8 - 5| = 3
→ MSE is 3x harsher!
**Large error (prediction = 15, target = 5):**
- MSE: (15 - 5)² = 100
- MAE: |15 - 5| = 10
→ MSE is 10x harsher!
**The pattern:** As errors get bigger, MSE punishes them increasingly more. MAE stays consistent.
For any error amount e:
- MSE penalty: e²
- MAE penalty: |e|
Error = 1: MSE = 1, MAE = 1 (same)
Error = 2: MSE = 4, MAE = 2 (MSE is 2x worse)
Error = 3: MSE = 9, MAE = 3 (MSE is 3x worse)
Error = 10: MSE = 100, MAE = 10 (MSE is 10x worse!)
Step 5: Binary Cross-Entropy: For Yes/No Questions
When predicting categories (cat vs dog, spam vs not spam, pass vs fail), we need something different.
**Cross-Entropy punishes confident wrong answers VERY harshly.**
**Real example - Email classification:**
Your model predicts "This is SPAM" with 95% confidence.
But it's actually a legitimate email.
→ **Binary Cross-Entropy: 2.996** (huge penalty for being confident and wrong!)
Your model predicts "This is SPAM" with 55% confidence.
But it's actually a legitimate email.
→ **Binary Cross-Entropy: 0.693** (much smaller penalty for being uncertain)
Your model predicts "This is SPAM" with 5% confidence.
And it IS actually a legitimate email.
→ **Binary Cross-Entropy: 0.051** (tiny penalty for being right!)
**The rule:** Being confidently wrong is catastrophically bad. Being uncertain is okay.
BCE = -[y × ln(p) + (1-y) × ln(1-p)]
Where:
y = actual (1 for class, 0 for not)
p = predicted probability
Example (actual = 0, predicted = 0.95):
BCE = -[0 × ln(0.95) + 1 × ln(0.05)]
= -[0 + (-2.996)]
= 2.996
Example (actual = 0, predicted = 0.05):
BCE = -[0 × ln(0.05) + 1 × ln(0.95)]
= -[0 + (-0.051)]
= 0.051Step 6: Categorical Cross-Entropy: For Many Categories
When predicting one category from many (cat/dog/bird/rabbit), we use Categorical Cross-Entropy.
**Real example - Image classification (picking between 4 animals):**
Your model's confidence for an actual **cat** image:
- Cat: 85%
- Dog: 10%
- Bird: 3%
- Rabbit: 2%
Categorical CE = **0.163** (good! High confidence in the right answer)
But if the model was confused:
- Cat: 25%
- Dog: 25%
- Bird: 25%
- Rabbit: 25%
Categorical CE = **1.386** (bad! No confidence in the right answer)
**Key difference from Binary:** This handles 3+ categories instead of just yes/no.
Categorical CE = -Σ y_i × ln(p_i)
Where:
y_i = 1 if true class, 0 otherwise
p_i = predicted probability for class i
Example (true class = cat):
CE = -(1×ln(0.85) + 0×ln(0.10) + 0×ln(0.03) + 0×ln(0.02))
= -(1×(-0.163) + 0 + 0 + 0)
= 0.163Step 7: Which Loss Function Should I Use?
**Predicting a number?** (house price, temperature, student GPA)
→ Use **MSE** (or MAE if you have outliers)
**Predicting yes/no?** (spam?, cat in photo?, fraud?)
→ Use **Binary Cross-Entropy**
**Predicting one from many?** (cat/dog/bird, A/B/C/D grade, animal type)
→ Use **Categorical Cross-Entropy**
**Quick decision tree:**
1. What am I predicting? A number? → MSE/MAE
2. A yes/no question? → Binary Cross-Entropy
3. Picking one from multiple categories? → Categorical Cross-Entropy
That's it! Pick the one that matches your task.
Step 8: Key Takeaways
**Loss functions measure how wrong your model is:**
1. **MSE (Mean Squared Error)**
- For predicting numbers
- Harsh on big errors
- Example: MSE = 875 for house predictions with one $50k error
2. **MAE (Mean Absolute Error)**
- For predicting numbers with outliers
- Fair to all errors
- Example: MAE = 21.67 for same houses (much gentler)
3. **Binary Cross-Entropy**
- For yes/no predictions
- Punishes confident wrong answers
- Example: 2.996 loss for 95% confidence wrong, 0.051 for 5% confidence wrong
4. **Categorical Cross-Entropy**
- For picking one from many categories
- Extends binary CE to handle 3+ classes
- Example: 0.163 loss for 85% confidence in correct cat prediction
**Golden rule:** Match your loss function to your task. MSE/MAE for regression, Cross-Entropy for classification.
**Next up:** Learn how networks actually reduce their loss using Gradient Descent!
Prerequisites
- Basic understanding of what neural networks do
Key Concepts
- Mean Squared Error (MSE)
- Mean Absolute Error (MAE)
- Cross-Entropy