Convolutional Neural Networks (CNNs) are specialized neural networks designed for processing grid-like data, especially images. They use a mathematical operation called convolution to automatically learn spatial hierarchies of features—from simple edges to complex patterns. CNNs revolutionized computer vision and are behind facial recognition, self-driving cars, and medical image analysis.
Watch how a convolutional filter detects edges in an image
The filter slides across the image, detecting features at each position
CNNs use parameter sharing and local connectivity to process images efficiently. Instead of connecting every pixel to every neuron (like feedforward networks), CNNs use small filters that detect local patterns. The same filter is reused across the entire image, dramatically reducing parameters while maintaining translation invariance—meaning the network can recognize a cat whether it's in the top-left or bottom-right corner.
When a CNN identifies a dog in a photo:
Layer 1 (Conv): 32 filters detect edges, corners, and basic textures
Layer 2 (Pool): Reduce image size by 2x, keep important features
Layer 3 (Conv): 64 filters combine edges into shapes (eyes, ears, nose)
Layer 4 (Pool): Further dimensionality reduction
Layer 5 (Conv): 128 filters recognize complex patterns (dog face, body)
Layer 6 (FC): Combines all features to classify: "Golden Retriever, 95% confidence"
1. Input: Raw pixel values (e.g., 224×224×3 RGB image)
2. Convolution: Apply filters to extract features at each position
3. Activation: Apply ReLU to introduce non-linearity
4. Pooling: Downsample to reduce spatial dimensions and computation
5. Repeat: Stack multiple conv-activation-pool blocks
6. Flatten: Convert 3D feature maps to 1D vector
7. Fully Connected: Standard neural network layers for classification
8. Softmax: Output probabilities for each class