Convolutional Neural Networks

How Machines Learn to See

Difficulty

Intermediate

Duration

20-25 minutes

Prerequisites

Neural networks, Activation functions

What You'll Discover

Understand how CNNs extract visual features from raw pixels

Convolution & Filters

See how small learnable filters slide across images to detect edges, textures, and patterns.

Feature Maps

Watch filters transform raw pixels into feature maps that highlight detected patterns.

Pooling & Downsampling

Understand how pooling reduces dimensions while preserving the most important features.

Full CNN Pipeline

Trace data from raw image through convolution, pooling, and classification layers.

Key Concepts

Convolution

Sliding a filter across an image to compute local dot products

Filters / Kernels

Small learnable matrices that detect specific patterns

Feature Maps

Output of a convolution showing where patterns are detected

Stride & Padding

Controls for filter movement and output dimensions

Max Pooling

Downsampling by keeping the strongest activation in each region

Feature Hierarchy

Early layers detect edges; deep layers detect complex objects

Continue Learning

Explore related topics to deepen your understanding

Recurrent Neural Networks

Learn how RNNs process sequential data like text and time series

Embeddings & Representation Learning

Discover how neural networks learn meaningful vector representations

Step

1/ 8

What Are Convolutional Neural Networks?

Regular neural networks connect every input to every neuron. For a tiny 28×28 grayscale image, that's 784 inputs. For a 1080p color photo? That's 6.2 million inputs — each connected to every neuron in the first layer. The number of parameters explodes, and the network has no understanding of spatial structure.

Convolutional Neural Networks (CNNs) solve this by exploiting a key insight: nearby pixels are related. Instead of looking at the entire image at once, CNNs scan small regions with learnable filters that detect local patterns like edges, corners, and textures.

Think of it like reading a book: you don't look at every letter on the page simultaneously. Your eyes scan across the text, recognizing familiar patterns (letters, words) as they go. CNNs work the same way — sliding small "windows" across the image.

CNNs revolutionized computer vision starting with LeNet (1998) for digit recognition, and AlexNet (2012) which won the ImageNet competition by a massive margin. Today they power everything from face recognition to medical imaging to self-driving cars.

Regular Networks vs CNNs

Aspect	Regular Neural Network	Convolutional Neural Network
Input handling	Flattens image to 1D vector	Preserves 2D spatial structure
Connections	Every input → every neuron (dense)	Small local regions → filter (sparse)
Parameters (28×28 image)	784 × neurons = massive	Filter size × filters = tiny
Translation invariance	None — position matters	Built-in — same filter everywhere
Best for	Tabular data, fixed-size inputs	Images, spatial data, grids

CNN Applications in the Real World

Application	Input	What CNN Detects	Example
Image Classification	Photo	Object categories	Is this a cat or dog?
Object Detection	Photo/Video	Object locations + categories	Where are the pedestrians?
Medical Imaging	X-ray / MRI	Anomalies and patterns	Tumor detection in chest X-rays
Face Recognition	Face photo	Identity features	Unlock phone with face
Autonomous Driving	Camera feed	Road, signs, obstacles	Lane detection and navigation
OCR	Document image	Characters and text	Reading handwritten digits

Next: Recurrent Neural Networks