AI Pros Bootcamp – Module 2 - Session 4: Training and Evaluating Your Classifier

Setting Up Training

Everything you need:

Model (on device)
Loss function
Optimizer
Data loaders

Device and Model Setup

device = torch.device('cuda' if torch.cuda.is_available() 
                      else 'cpu')
model = MNISTClassifier().to(device)

Both model and data must be on the same device

Loss Function

loss_function = nn.CrossEntropyLoss()

Designed for classification tasks

Perfect for choosing a digit from 0-9

Optimizer

optimizer = optim.Adam(model.parameters(), lr=0.001)

Adam: adapts learning rate as it trains

Larger adjustments early (noisy gradients)
Smaller corrections later (training stabilizes)

Training Function

def train_one_epoch(model, dataloader, loss_fn, optimizer, device):
    model.train()  # Set to training mode
    running_loss = 0.0
    correct = 0
    total = 0
    
    for batch_idx, (data, targets) in enumerate(dataloader):
        # Move to device
        data, targets = data.to(device), targets.to(device)
        
        # Training steps
        optimizer.zero_grad()
        outputs = model(data)
        loss = loss_fn(outputs, targets)
        loss.backward()
        optimizer.step()
        
        # Track progress
        running_loss += loss.item()
        _, predicted = outputs.max(1)
        total += targets.size(0)
        correct += predicted.eq(targets).sum().item()
        
        if batch_idx % 100 == 0:
            print(f'Loss: {loss.item():.4f}, '
                  f'Accuracy: {100.*correct/total:.2f}%')

So now, let’s create a function to train our model for one epoch. The function takes in five inputs: your model, the data loader, the loss function, the optimizer, and the device that everything should run on.

You’ll start with model.train(), and this puts the model into training mode. You’ll set up three tracking variables: running_loss accumulates the loss values, correct counts predictions that match the true labels, and total counts all of the samples that you’ve seen so far.

Next, we’re going to loop over all of the batches. And in each batch, we’re going to move the data and the target to the right device, clear any leftover gradients with optimizer.zero_grad(). We’ll then run a forward pass by calling the model and passing it the data to get output. We’ll compute the loss by calling the loss function. We’ll back propagate with loss.backward(), and then we’ll update the weights with optimizer.step().

And then you track your progress. loss.item() gives us the label value, and output.max tells us which digit class got the highest score and lets you compare that to that label value. Every 100 batches, you’re going to print out the current loss and accuracy.

Understanding the Training Progress

With 60,000 training images and batch size 64:

About 938 batches per epoch
Around 9 progress updates

Watch the numbers:

Loss: dropping (0.64 → 0.17)
Accuracy: climbing (81% → 95%)

Evaluation Function

def evaluate(model, dataloader, device):
    model.eval()  # Set to evaluation mode
    correct = 0
    total = 0
    
    with torch.no_grad():  # Disable gradient tracking
        for data, targets in dataloader:
            data, targets = data.to(device), targets.to(device)
            outputs = model(data)
            _, predicted = outputs.max(1)
            total += targets.size(0)
            correct += predicted.eq(targets).sum().item()
    
    return 100. * correct / total

Putting It All Together

num_epochs = 10

for epoch in range(num_epochs):
    print(f'Epoch {epoch+1}/{num_epochs}')
    
    # Train
    train_one_epoch(model, train_loader, loss_function, 
                    optimizer, device)
    
    # Evaluate
    accuracy = evaluate(model, test_loader, device)
    print(f'Test Accuracy: {accuracy:.2f}%')
    print('-' * 50)

10 epochs = 10 full passes through training data

After each epoch: evaluate on test set

What You’ll See

By epoch 10: Loss: tiny, Accuracy: high (often 95%+)

When accuracy stops improving: Model is done learning, May not need all 10 epochs

Module 2 Summary

You’ve learned:

PyTorch data pipeline (Dataset, DataLoader, Transforms)
Building custom models with nn.Module
Loss functions (MSE vs Cross-Entropy)
How optimizers use gradients to update weights
Device management (CPU vs GPU)
Complete image classification pipeline

Lab 1: Building Your First Image Classifier

“Don’t tell me the moon is shining; show me the glint of light on broken glass.”

CUE: START THE LAB HERE

Assignment 2: EMNIST Letter Detective

“The eye sees only what the mind is prepared to comprehend.”

CUE: START THE ASSIGNMENT HERE

What’s Next?

In Module 3: Data Management we learn:

How to manage and preprocess data for deep learning
How to build a robust data pipeline
How to use data augmentation to improve model performance
How to use data validation to ensure model performance