Convolutional Neural Networks

How Machines Learn to See

Difficulty
Intermediate
Duration
20-25 minutes
Prerequisites
Neural networks, Activation functions

What You'll Discover

Understand how CNNs extract visual features from raw pixels

Convolution & Filters

See how small learnable filters slide across images to detect edges, textures, and patterns.

Feature Maps

Watch filters transform raw pixels into feature maps that highlight detected patterns.

Pooling & Downsampling

Understand how pooling reduces dimensions while preserving the most important features.

Full CNN Pipeline

Trace data from raw image through convolution, pooling, and classification layers.

Key Concepts

Convolution

Sliding a filter across an image to compute local dot products

Filters / Kernels

Small learnable matrices that detect specific patterns

Feature Maps

Output of a convolution showing where patterns are detected

Stride & Padding

Controls for filter movement and output dimensions

Max Pooling

Downsampling by keeping the strongest activation in each region

Feature Hierarchy

Early layers detect edges; deep layers detect complex objects

Step
1/ 8

What Are Convolutional Neural Networks?

Regular neural networks connect every input to every neuron. For a tiny 28×28 grayscale image, that's 784 inputs. For a 1080p color photo? That's 6.2 million inputs — each connected to every neuron in the first layer. The number of parameters explodes, and the network has no understanding of spatial structure.

Convolutional Neural Networks (CNNs) solve this by exploiting a key insight: nearby pixels are related. Instead of looking at the entire image at once, CNNs scan small regions with learnable filters that detect local patterns like edges, corners, and textures.

Think of it like reading a book: you don't look at every letter on the page simultaneously. Your eyes scan across the text, recognizing familiar patterns (letters, words) as they go. CNNs work the same way — sliding small "windows" across the image.

CNNs revolutionized computer vision starting with LeNet (1998) for digit recognition, and AlexNet (2012) which won the ImageNet competition by a massive margin. Today they power everything from face recognition to medical imaging to self-driving cars.

Regular Networks vs CNNs

AspectRegular Neural NetworkConvolutional Neural Network
Input handlingFlattens image to 1D vectorPreserves 2D spatial structure
ConnectionsEvery input → every neuron (dense)Small local regions → filter (sparse)
Parameters (28×28 image)784 × neurons = massiveFilter size × filters = tiny
Translation invarianceNone — position mattersBuilt-in — same filter everywhere
Best forTabular data, fixed-size inputsImages, spatial data, grids

CNN Applications in the Real World

ApplicationInputWhat CNN DetectsExample
Image ClassificationPhotoObject categoriesIs this a cat or dog?
Object DetectionPhoto/VideoObject locations + categoriesWhere are the pedestrians?
Medical ImagingX-ray / MRIAnomalies and patternsTumor detection in chest X-rays
Face RecognitionFace photoIdentity featuresUnlock phone with face
Autonomous DrivingCamera feedRoad, signs, obstaclesLane detection and navigation
OCRDocument imageCharacters and textReading handwritten digits