AI Pros Bootcamp – Module 4 - Session 1: Convolutional Neural Networks

Module 4 Overview

What will we learn?

Convolutional layers: filters, patterns, and feature maps
Complete CNN architecture: convolution, pooling, fully connected layers
Training CNNs for image classification
Dynamic computation graphs in PyTorch
Modular architectures and code organization
Model inspection and debugging techniques

Welcome to Module 4! You’ve mastered data pipelines and built models with linear layers. But now the butterfly house next door wants to expand your botanical garden app to classify insects and small animals. Linear layers treat every pixel independently - they can’t see that neighboring pixels form features like wings, antennae, or eye spots.

This module introduces Convolutional Neural Networks (CNNs), the backbone of computer vision. We’ll explore how CNNs learn to see patterns in images, build complete CNN architectures, and then dive into PyTorch’s dynamic computation graphs and professional code organization.

The question chain: How do CNNs see patterns? → How do we build a complete CNN? → How do we train it? → How does PyTorch’s flexibility help us? → How do we write professional, maintainable code? → How do we inspect and debug our models?

Session 1: Convolutional Neural Networks

What you’ll know by the end:

How convolutional filters detect patterns in images
The complete architecture of a CNN
How to train a CNN for multi-class image classification

The New Challenge

Butterfly house expansion

Classify flowers, insects, and small animals
Need to detect edges, textures, and patterns
Linear layers aren’t enough

Why Linear Layers Fall Short

Every pixel is independent

No spatial understanding
Can’t recognize patterns formed by neighboring pixels
Wings, antennae, eye spots are invisible

Convolutional Neural Networks

Inspired by biology

1960s: Visual cortex neurons respond to specific patterns
CNNs mimic this with learnable filters
Filters scan images to extract features

How Filters Work

Source: https://dennybritz.com/posts/wildml/understanding-convolutional-neural-networks-for-nlp/

A 3×3 grid of numbers

Slide over the image
Multiply filter values with pixel values
Sum the results
This is convolution

See: Convolution Arithmetic for more details.

What Do Filters Detect?

Vertical edges
Horizontal edges
Textures and shapes

Butteryfly Example

Butterfly image passing through one filter of first layer.

Learning vs. Hand-Designing Filters

Different weights → different patterns

The Power of Hierarhical Feature Extraction

Source: Receptive Field in Deep Convolutional Networks | by Reza Kalantar | Medium

The image illustrates how a single “pixel” in a deep layer of a neural network can “see” a much larger portion of the original input image. This concept is called the Receptive Field. Because the orange area in Layer 2 was already looking at a larger area in Layer 1, the single pixel in Layer 3 is effectively “aware” of a area in the original input.

Creating Convolutional Layers in PyTorch

nn.Conv2d(
    in_channels=3,      # RGB color channels
    out_channels=32,    # Number of filters
    kernel_size=3,      # 3×3 filter size
    padding=1,          # Preserve image size
    stride=1            # Step size
)

Number of parameters for this layer:

\[ [(\text{kernel_size}^2 \times \text{in_channels}) + \underbrace{1}_{\text{bias}}] \times \text{out_channels} \]

Output: Activation/Feature Maps

Here we see a out_channels=16 of convolution outputs showing high values where they activate (after ReLU()).

Pooling

Example: 28×28 feature map

After first pool: 14×14
After second pool: 7×7
Each pooling layer halves the spatial dimensions

Then it’s fed into something called MaxPool2d. What is that? Well, let’s take a look. Pooling is a common technique in convolutional neural networks that’s used to reduce the size of feature maps. It’s effectively a way to throw away pixels after a filter has been applied, compressing the data while keeping the most important parts in a way that shouldn’t affect the results.

The logic here is that your filters have already extracted the important features from the original image. So now by applying pooling, you’re compressing each filtered image, keeping just the most significant information. As a result, less data needs to pass through the network, and the next layer sees images that were only a quarter of the original size.

This is important because after your first convolutional layer, you now have 32 different feature maps flowing into the second layer. For large images this quickly becomes a lot of data. Pooling reduces this volume of information, making your neural network much more efficient without losing valuable details and more robust to small changes.

Building a Complete CNN Architecture

A sequential conv-pool conv-pool flatten fc fc architecture

CNN Architecture Overview

Three main components:

class CNN(nn.Module):
    def __init__(self):
        # Convolutional layers → extract features
        # Pooling layers → reduce size
        # Fully connected layers → classify

Define the flow:

    def forward(self, x):
        x = self.conv1(x)
        x = self.relu1(x)
        x = self.pool1(x)
        # ... more layers ...
        x = x.flatten()
        x = self.fc(x)
        return x

The Snow Detector Problem

Husky misclassified as wolf

Model fixated on snow in background
Some neurons became “snow detectors”
Others relied on them (co-adaptation)

Regularization

Regularization in deep learning refers to techniques used to prevent models from overfitting to the training data. Overfitting occurs when a model learns not only the underlying patterns but also the noise in the data, resulting in poor performance on unseen data. Regularization methods add a form of constraint or penalty to the learning process, encouraging simpler models that generalize better.

Dropout

Dropout: Randomly turns off a fraction of neurons during training, forcing the network to develop redundant representations and making it less likely to rely too heavily on any single feature.

Weight Decay

Weight Decay: Add a penalty to the loss function based on the size of the weights, encouraging them to be smaller and discouraging complex models.

The two regularization techniques

Feature	Dropout	Weight Decay
Mechanism	Randomly deactivates neurons.	Penalizes large weight values.
Goal	Breaks co-dependency between neurons.	Keeps the model simple and less sensitive.
Active When?	Only during training.	During training (via the optimizer).
Analogy	A team where players are randomly benched so everyone learns to play every position.	A coach telling players not to over-commit to a single move so they stay balanced.

Dataset Issue

If most wolf images in your dataset have snow and dog images don’t, that’s a dataset problem.

Solution

Get more representative data.

Lab 1: Building a CNN for Nature Classification

“If we want machines to think, we need to teach them to see.” — ImageNet Project launch

CUE: START THE LAB HERE

What’s Next?

In Session 2: PyTorch Techniques and Model Inspection we learn:

Dynamic computation graphs in PyTorch
Building modular architectures
Model inspection and debugging