Convolutional Neural Networks
How Machines Learn to See
What You'll Discover
Understand how CNNs extract visual features from raw pixels
Convolution & Filters
See how small learnable filters slide across images to detect edges, textures, and patterns.
Feature Maps
Watch filters transform raw pixels into feature maps that highlight detected patterns.
Pooling & Downsampling
Understand how pooling reduces dimensions while preserving the most important features.
Full CNN Pipeline
Trace data from raw image through convolution, pooling, and classification layers.
Key Concepts
Convolution
Sliding a filter across an image to compute local dot products
Filters / Kernels
Small learnable matrices that detect specific patterns
Feature Maps
Output of a convolution showing where patterns are detected
Stride & Padding
Controls for filter movement and output dimensions
Max Pooling
Downsampling by keeping the strongest activation in each region
Feature Hierarchy
Early layers detect edges; deep layers detect complex objects
Continue Learning
Explore related topics to deepen your understanding
What Are Convolutional Neural Networks?
Regular neural networks connect every input to every neuron. For a tiny 28×28 grayscale image, that's 784 inputs. For a 1080p color photo? That's 6.2 million inputs — each connected to every neuron in the first layer. The number of parameters explodes, and the network has no understanding of spatial structure.
Convolutional Neural Networks (CNNs) solve this by exploiting a key insight: nearby pixels are related. Instead of looking at the entire image at once, CNNs scan small regions with learnable filters that detect local patterns like edges, corners, and textures.
Think of it like reading a book: you don't look at every letter on the page simultaneously. Your eyes scan across the text, recognizing familiar patterns (letters, words) as they go. CNNs work the same way — sliding small "windows" across the image.
CNNs revolutionized computer vision starting with LeNet (1998) for digit recognition, and AlexNet (2012) which won the ImageNet competition by a massive margin. Today they power everything from face recognition to medical imaging to self-driving cars.
Regular Networks vs CNNs
| Aspect | Regular Neural Network | Convolutional Neural Network |
|---|---|---|
| Input handling | Flattens image to 1D vector | Preserves 2D spatial structure |
| Connections | Every input → every neuron (dense) | Small local regions → filter (sparse) |
| Parameters (28×28 image) | 784 × neurons = massive | Filter size × filters = tiny |
| Translation invariance | None — position matters | Built-in — same filter everywhere |
| Best for | Tabular data, fixed-size inputs | Images, spatial data, grids |
CNN Applications in the Real World
| Application | Input | What CNN Detects | Example |
|---|---|---|---|
| Image Classification | Photo | Object categories | Is this a cat or dog? |
| Object Detection | Photo/Video | Object locations + categories | Where are the pedestrians? |
| Medical Imaging | X-ray / MRI | Anomalies and patterns | Tumor detection in chest X-rays |
| Face Recognition | Face photo | Identity features | Unlock phone with face |
| Autonomous Driving | Camera feed | Road, signs, obstacles | Lane detection and navigation |
| OCR | Document image | Characters and text | Reading handwritten digits |