Neural Network Forward Pass — Lesson Content
Follow a real-world example: predicting if a student will pass their test. Watch how data flows through layers of neurons step by step.
Trace the complete forward pass of a neural network predicting whether a student named Alex will pass tomorrow's exam. Using Alex's study time, sleep, and previous score, you'll see how data flows from the input layer through hidden neurons to the final prediction.
Each step breaks down the math — weighted sums, activation functions, and the final decision — so you understand exactly what happens inside a neural network. This is the same fundamental process used by all neural networks, from image classifiers to language models.
Learning Objectives
- Understand how data flows through a neural network layer by layer
- Learn what normalization is and why inputs need scaling
- See how weighted sums combine inputs with learned importance
- Understand the role of ReLU activation in adding non-linearity
- Follow a complete forward pass from raw data to prediction
Step 1: The Problem: Will This Student Pass?
Meet **Alex**, a student preparing for tomorrow's exam. We have three pieces of information:
- **Hours Studied Today**: 7 hours
- **Hours Slept Last Night**: 8 hours
- **Previous Test Score**: 85%
Can we predict if Alex will **pass** or **fail** tomorrow's test? We'll use a neural network that has been trained on data from thousands of students.
Let's walk through exactly how the network processes Alex's data, step by step.
Step 2: Network Architecture: Three Layers
Our neural network has three layers:
- **Input Layer** (3 neurons): Receives Alex's data — one neuron per feature
- **Hidden Layer** (4 neurons): Detects patterns and relationships between features
- **Output Layer** (2 neurons): Produces a "Pass" score and a "Fail" score
The lines connecting neurons are **weights** — they determine how much influence each input has. Green lines are positive (excitatory), red lines are negative (inhibitory), and thicker lines indicate stronger connections.
This network learned these weights by training on thousands of past student records.
Step 3: Step 1: Feeding in Alex's Data
Neural networks work best with numbers between 0 and 1, so we **normalize** each feature:
- **Hours Studied**: 7 / 10 = **0.70**
- **Hours Slept**: 8 / 10 = **0.80**
- **Previous Score**: 85 / 100 = **0.85**
Each input neuron now holds one normalized value. This normalization ensures all features are on the same scale so no single feature dominates.
Normalization:
x1 = 7 / 10 = 0.70
x2 = 8 / 10 = 0.80
x3 = 85 / 100 = 0.85
# Normalize inputs to 0-1 range
inputs = [
hours_studied / 10, # 0.70
hours_slept / 10, # 0.80
previous_score / 100 # 0.85
]
Step 4: Step 2: Hidden Layer Weighted Sums
Each hidden neuron computes a **weighted sum** of all inputs plus a bias. Think of each neuron as a "pattern detector" — it amplifies inputs it considers important and diminishes ones it doesn't.
**Hidden Neuron 1 (example):**
- (0.70 x 0.8) + (0.80 x 0.4) + (0.85 x 0.7) + 0.1 = **1.575**
All four hidden neurons compute similar weighted sums with their own weights, each looking for a different pattern in Alex's data.
For each hidden neuron j:
z_j = sum(input_i * weight_ij) + bias_j
Neuron 1: (0.70 x 0.8) + (0.80 x 0.4) + (0.85 x 0.7) + 0.1 = 1.575
Neuron 2: 0.885
Neuron 3: 0.525
Neuron 4: 0.990
# Hidden layer: weighted sum
for j in range(hidden_size):
z = bias_hidden[j]
for i in range(input_size):
z += inputs[i] * W_input_hidden[i][j]
hidden_raw[j] = z
Step 5: Step 3: ReLU Activation
Raw weighted sums are passed through the **RELU** activation function. ReLU is simple: keep positive values, zero out negatives.
This non-linearity is what lets neural networks learn complex, curved decision boundaries instead of just straight lines.
**After activation:**
- Neuron 1: 1.575 -> **1.575** (active!)
- Neuron 2: 0.885 -> **0.885** (active!)
- Neuron 3: 0.525 -> **0.525** (active!)
- Neuron 4: 0.990 -> **0.990** (active!)
Neurons that "fire" (output > 0) detected a strong-enough pattern in Alex's data. Neurons that output 0 didn't find their pattern.
ReLU(z) = max(0, z)
ReLU(1.575) = 1.575
ReLU(0.885) = 0.885
ReLU(0.525) = 0.525
ReLU(0.990) = 0.990
def relu(z):
return max(0, z)
# Apply activation to hidden layer
hidden_activated = [relu(z) for z in hidden_raw]
# [1.575,0.885,0.525,0.99]
Step 6: Step 4: Output Layer Computation
The hidden layer activations now flow to the output layer. We have two output neurons:
- **"Pass" neuron**: Aggregates evidence that Alex will pass
- **"Fail" neuron**: Aggregates evidence that Alex will fail
Each output neuron computes its own weighted sum from the hidden activations. Notice the weight pattern: the "Pass" neuron has **positive** weights (high hidden activations increase pass confidence) while the "Fail" neuron has **negative** weights (mirror image).
**Raw output scores:**
- Pass neuron: **3.444**
- Fail neuron: **-3.444**
Pass = sum(hidden_j * w_j_pass) + bias_pass
= (1.57 x 0.90) + (0.88 x 0.70) + (0.53 x 0.60) + (0.99 x 0.80) + 0.3
= 3.444
Fail = sum(hidden_j * w_j_fail) + bias_fail
= -3.444# Compute output layer
for k in range(output_size):
z = bias_output[k]
for j in range(hidden_size):
z += hidden_activated[j] * W_hidden_output[j][k]
output_raw[k] = z
# pass_score = 3.444
# fail_score = -3.444
Step 7: The Prediction: Alex Will PASS!
After applying ReLU to the output layer, we get the final confidence scores:
- **Pass**: 344.4% confidence
- **Fail**: 0.0% confidence
**Prediction: PASS** with 344.4% confidence!
The network predicts success because:
- 7 hours of study is substantial
- 8 hours of sleep ensures sharpness
- 85% previous score shows a strong foundation
- These patterns match thousands of successful students the network trained on
Final scores (after ReLU):
Pass confidence: 3.444 (344.4%)
Fail confidence: 0.000 (0.0%)
Prediction: argmax(pass, fail) = PASS
# Apply activation and decide
pass_score = relu(output_raw[0]) # 3.444
fail_score = relu(output_raw[1]) # 0.000
if pass_score > fail_score:
prediction = "PASS"
confidence = pass_score
else:
prediction = "FAIL"
confidence = fail_score
print(f"Prediction: {prediction}")
print(f"Confidence: {confidence:.1%}")
Step 8: How Neural Networks Learn
You just watched a complete forward pass! Here's the big picture:
**What we did (forward pass):**
1. **Normalized** raw data into 0-1 range
2. **Multiplied** inputs by learned weights and summed them
3. **Applied ReLU** to introduce non-linearity
4. **Repeated** for the output layer
5. **Picked the winner** as the prediction
**How the network learned these weights (training):**
1. Start with random weights
2. Feed in a student's data and predict pass/fail
3. Compare prediction to the actual outcome
4. Use **backpropagation** to calculate how each weight contributed to the error
5. Adjust weights using **gradient descent** to reduce the error
6. Repeat with thousands of students until the weights converge
**This same process powers:**
- Image recognition (pixels -> object labels)
- Language models like ChatGPT (text -> next word)
- Self-driving cars (sensors -> driving decisions)
The architecture you explored is the foundation of modern AI — more complex networks use the same principles with millions of neurons and billions of weights.
Prerequisites
- Basic understanding of functions
- No advanced math required!
Key Concepts
- Forward Propagation
- Neurons & Weights
- Activation Functions
- Pattern Recognition
- Making Predictions