Grad-CAM from Scratch with PyTorch Hooks

automotive stops out of the blue. Worryingly, there isn’t a cease sign up sight. The engineers can solely make guesses as to why the automotive’s neural community grew to become confused. It could possibly be a tumbleweed rolling throughout the road, a automotive coming down the opposite lane or the purple billboard within the background. To seek out the actual cause, they flip to Grad-CAM [1].

Grad-CAM is an explainable AI (XAI) method that helps reveal why a convolutional neural community (CNN) made a specific determination. The tactic produces a heatmap that highlights the areas in a picture which are crucial for a prediction. For our self-driving automotive instance, this might present if the pixels from the weed, automotive or billboard precipitated the automotive to cease.

Now, Grad-CAM is considered one of many XAI methods for Computer Vision. Attributable to its velocity, flexibility and reliability, it has shortly turn into one of the fashionable. It has additionally impressed many associated strategies. So, if you’re excited about XAI, it’s price understanding precisely how this technique works. To try this, we will likely be implementing Grad-CAM from scratch utilizing Python.

Particularly, we will likely be counting on PyTorch Hooks. As you will notice, these enable us to dynamically extract gradients and activations from a community throughout ahead and backwards passes. These are sensible abilities that won’t solely assist you to implement Grad-CAM but additionally any gradient-based XAI technique. See the total challenge on GitHub.

The speculation behind Grad-CAM

Earlier than we get to the code, it’s price referring to the idea behind Grad-CAM. In order for you a deep dive, then take a look at the video under. If you wish to find out about different strategies, then see this free XAI for Computer Vision course.

To summarise, when creating Grad-CAM heatmaps, we begin with a educated CNN. We then do a ahead go by means of this community with a single pattern picture. It will activate all convolutional layers within the community. We name these function maps ($A^ok$). They are going to be a set of 2D matrices that include completely different options detected within the pattern picture.

With Grad-CAM, we’re sometimes within the maps from the final convolutional layer of the community. Once we apply the strategy to VGG16, you will notice that its ultimate layer has 512 function maps. We use these as they include options with essentially the most detailed semantic info whereas nonetheless retaining spatial info. In different phrases, they inform us what was used for a prediction and the place within the picture it was taken from.

The issue is that these maps additionally include options which are essential for different courses. To mitigate this, we observe the method proven in Determine 1. As soon as now we have the function maps ($A^ok$), we weight them by how essential they’re to the category of curiosity ($y_c$). We do that utilizing $a_k^c$ — the common gradient of the rating for $y_c$ w.r.t. to the weather within the function map. We then do element-wise summation. For VGG16, you will notice we go from 512 maps of 14×14 pixels to a single 14×14 map.

Determine 1: element-wise summation of the weighted function maps from the final convolutional layer in a CNN (supply: creator)

The gradients for a person aspect ($frac{partial y^c}{partial A_{ij}^ok}$) inform us how a lot the rating will change with a small change within the aspect. Because of this giant common gradients point out that the whole function map was essential and will contribute extra to the ultimate heatmap. So, once we weight and sum the maps, those that include options for different courses will seemingly contribute much less.

The ultimate steps are to use the ReLU activation perform to make sure all unfavorable components may have a price of zero. Then we upsample with interpolation so the heatmap has the identical dimensions because the pattern picture. The ultimate map is summarised by the formulation under. You would possibly recognise it from the Grad-CAM paper [1].

$$ L_{Grad-CAM}^c = ReLUleft( sum_{ok} a_k^c A^ok proper) $$

Grad-CAM from Scratch

Don’t fear if the idea just isn’t fully clear. We’ll stroll by means of it step-by-step as we apply the strategy from scratch. Yow will discover the total challenge on GitHub. To begin, now we have our imports under. These are all widespread imports for laptop imaginative and prescient issues.

import matplotlib.pyplot as plt
import numpy as np

import cv2
from PIL import Picture

import torch
import torch.nn.purposeful as F
from torchvision import fashions, transforms

import urllib.request

Load pretrained mannequin from PyTorch

We’ll be making use of Grad-CAM to VGG16 pretrained on ImageNet. To assist, now we have the 2 features under. The primary will format a picture within the appropriate means for enter into the mannequin. The normalisation values used are the imply and customary deviation of the photographs in ImageNet. The 224×224 measurement can be customary for ImageNet fashions.

def preprocess_image(img_path):

    """Load and preprocess photos for PyTorch fashions."""

    img = Picture.open(img_path).convert("RGB")

    #Transforms utilized by imagenet fashions
    remodel = transforms.Compose([
        transforms.Resize((224, 224)),
        transforms.ToTensor(),
        transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
    ])

    return remodel(img).unsqueeze(0)

ImageNet has many courses. The second perform will format the output of the mannequin so we show the courses with the best predicted chances.

def display_output(output,n=5):

    """Show the highest n classes predicted by the mannequin."""
    
    # Obtain the classes
    url = "https://uncooked.githubusercontent.com/pytorch/hub/grasp/imagenet_classes.txt"
    urllib.request.urlretrieve(url, "imagenet_classes.txt")

    with open("imagenet_classes.txt", "r") as f:
        classes = [s.strip() for s in f.readlines()]

    # Present prime classes per picture
    chances = torch.nn.purposeful.softmax(output[0], dim=0)
    top_prob, top_catid = torch.topk(chances, n)

    for i in vary(top_prob.measurement(0)):
        print(classes[top_catid[i]], top_prob[i].merchandise())

    return top_catid[0]

We now load the pretrained VGG16 model (line 2), transfer it to a GPU (traces 5-8) and set it to analysis mode (line 11). You possibly can see a snippet of the mannequin output in Determine 2. VGG16 is fabricated from 16 weighted layers. Right here, you’ll be able to see the final 2 of 13 convolutional layers and the three absolutely related layers.

# Load the pre-trained mannequin (e.g., VGG16)
mannequin = fashions.vgg16(pretrained=True)

# Set the mannequin to gpu
system = torch.system('mps' if torch.backends.mps.is_built() 
                      else 'cuda' if torch.cuda.is_available() 
                      else 'cpu')
mannequin.to(system)

# Set the mannequin to analysis mode
mannequin.eval()

The names you see in Determine 2 are essential. Later, we’ll use them to reference a particular layer within the community to entry its activations and gradients. Particularly, we’ll use mannequin.options[28]. That is the ultimate convolutional layer within the community. As you’ll be able to see within the snapshot, this layer accommodates 512 function maps.

Determine 2: snapshot of ultimate layers of the VGG16 community (supply: creator)

Ahead go with pattern picture

We will likely be explaining a prediction from this mannequin. To do that, we’d like a pattern picture that will likely be fed into the mannequin. We downloaded one from Wikipedia Commons (traces 2-3). We then load it (traces 5-6), crop it to have equal top and width (line 7) and show it (traces 9-10). In Determine 3, you’ll be able to see we’re utilizing a picture of a whale shark in an aquarium.

# Load a pattern picture from the net
img_url = "https://add.wikimedia.org/wikipedia/commons/thumb/a/a1/Male_whale_shark_at_Georgia_Aquarium.jpg/960px-Male_whale_shark_at_Georgia_Aquarium.jpg"
urllib.request.urlretrieve(img_url, "sample_image.jpg")[0]

img_path = "sample_image.jpg"
img = Picture.open(img_path).convert("RGB")
img = img.crop((320, 0, 960, 640))  # Crop to 640x640

plt.imshow(img)
plt.axis("off")

One of two resident male whale sharks in the Georgia Aquarium in the United States. — Determine 3: male whale shark in aquarium (supply: Wikimedia commons) (license: CC BY-SA 2.5)

ImageNet has no devoted class for whale sharks, so will probably be fascinating to see what the mannequin predicts. To do that, we begin by processing our picture (line 2) and shifting it to the GPU (line 3). We then do a ahead go to get a prediction (line 6) and show the highest 5 chances (line 7). You possibly can see these in Determine 4.

# Preprocess the picture
img_tensor = preprocess_image(img_path)
img_tensor = img_tensor.to(system)

# Ahead go
predictions = mannequin(img_tensor)
display_output(predictions,n=5)

Given the obtainable courses, these appear affordable. They’re all marine life and the highest two are sharks. Now, let’s see how we are able to clarify this prediction. We wish to perceive what areas of the picture contribute essentially the most to the best predicted class — hammerhead.

Determine 4: prime 5 predicted courses of the instance picture of the whale shark utilizing VGG16 (supply: creator)

PyTorch hooks naming conventions

Grad-CAM heatmaps are created utilizing each activations from a ahead go and gradients from a backwards go. To entry these, we’ll use PyTorch hooks. These are features that assist you to save the inputs and outputs of a layer. We received’t do it right here, however they even assist you to alter these features. For instance, Guided Backpropagation will be utilized by making certain solely optimistic gradients are propagated utilizing a backwards hook.

You possibly can see some examples of those features under. A forwards_hook will likely be known as throughout a ahead go. Will probably be registered on a given module (i.e. layer). By default, the perform receives three arguments — the module, its enter and its output. Equally, a backwards_hook is triggered throughout a backwards go with the module and gradients of the enter and output.

# Instance of a forwards hook perform
def fowards_hook(module, enter, output):
    """Parameters:
            module (nn.Module): The module the place the hook is utilized.
            enter (tuple of Tensors): Enter to the module.
            output (Tensor): Output of the module."""
    ...

# Instance of a backwards hook perform 
def backwards_hook(module, grad_in, grad_out):
    """Parameters:
            module (nn.Module): The module the place the hook is utilized.
            grad_in (tuple of Tensors): Gradients w.r.t. the enter of the module.
            grad_out (tuple of Tensors): Gradients w.r.t. the output of the module."""
    ...

To keep away from confusion, let’s make clear the parameter names utilized by these features. Check out the overview of the usual backpropagation process for a convolutional layer in Determine 5. This layer consists of a set of kernels, $Ok$, and biases, $b$. The opposite elements are the:

enter – a set of function maps or a picture
output – set of function maps
grad_in is the gradient of the loss w.r.t. the layer’s enter.
grad_out is the gradient of the loss w.r.t. the layer’s output.

We now have labelled these utilizing the identical names of the arguments used to name the hook features that we apply later.

Determine 5: Backpropagation for a convolutional layer in a deep studying mannequin. The blue arrows present the ahead go and the purple arrows present the backwards go. (supply: creator)

Have in mind, we received’t use the gradients in the identical means as backpropagation. Normally, we use the gradients of a batch of photos to replace $Ok$ and $b$. Now, we’re solely excited about grad_out of a single pattern picture. It will give us the gradients of the weather within the layer’s function maps. In different phrases, the gradients we use to weight the function maps.

Activations with PyTorch ahead hook

Our VGG16 community has been created utilizing ReLU with inplace=True. These modify tensors in reminiscence, so the unique values are misplaced. That’s, tensors used as enter are overwritten by the ReLU perform. This may result in issues when making use of hooks, as we may have the unique enter. So we use the code under to switch all ReLU features with inplace=False ones. This won’t influence the output of the mannequin, however it should improve its reminiscence utilization.

# Substitute all in-place ReLU activations with out-of-place ones
def replace_relu(mannequin):

    for identify, baby in mannequin.named_children():
        if isinstance(baby, torch.nn.ReLU):
            setattr(mannequin, identify, torch.nn.ReLU(inplace=False))
            print(f"Changing ReLU activation in layer: {identify}")
        else:
            replace_relu(baby)  # Recursively apply to submodules

# Apply the modification to the VGG16 mannequin
replace_relu(mannequin)

Under now we have our first hook perform — save_activations. It will append the output from a module (line 6) to a listing of activations (line 2). In our case, we’ll solely register the hook onto one module (i.e. the final convolutional layer), so this listing will solely include one aspect. Discover how we format the output (line 6). We detach it from the computational graph so the community just isn’t affected. We additionally format them as a numpy array and squeeze the batch dimension.

# Checklist to retailer activations
activations = []

# Operate to avoid wasting activations
def save_activations(module, enter, output):
    activations.append(output.detach().cpu().numpy().squeeze())

To make use of the hook perform, we register it on the final convolutional layer — mannequin.options[28]. That is finished utilizing the register_forward_hook perform.

# Register the hook to the final convolutional layer
hook = mannequin.options[28].register_forward_hook(save_activations)

Now, once we do a ahead go (line 2), the save_activations hook perform will likely be known as for this layer. In different phrases, its output will likely be saved to the activations listing.

# Ahead go by means of the mannequin to get activations
prediction = mannequin(img_tensor)

Lastly, it’s good apply to take away the hook perform when it’s now not wanted (line 2). This implies the ahead hook perform won’t be triggered if we do one other ahead go.

# Take away the hook after use
hook.take away()

The form of those activations is (512, 14, 14). In different phrases, now we have 512 function maps and every map is 14×14 pixels. You possibly can see some examples of those in Determine 6. A few of these maps could include options essential for different courses or those who lower the likelihood of the anticipated class. So let’s see how we are able to discover gradients to assist determine crucial maps.

act_shape = np.form(activations[0])
print(f"Form of activations: {act_shape}") # (512, 14, 14)

Determine 6: instance of activated function maps from the final convolutional layer of the community (supply: creator)

Gradients with PyTorch backwards hooks

To get gradients, we observe the same course of to earlier than. The important thing distinction is that we now use the register_full_backward_hook to register the save_gradients perform (line 7). It will make sure that it’s known as throughout a backwards go. Importantly, we do the backwards go (line 16) from the output for the category with the best rating (line 13). This successfully units the rating for this class to 1 and all different scores to 0. In different phrases, we get the gradients of the hammerhead class w.r.t. to the weather of the function maps.

gradients = []

def save_gradient(module, grad_in, grad_out):
    gradients.append(grad_out[0].cpu().numpy().squeeze())

# Register the backward hook on a convolutional layer
hook = mannequin.options[28].register_full_backward_hook(save_gradient)

# Ahead go
output = mannequin(img_tensor)

# Decide the category with highest rating
rating = output[0].max()

# Backward go from the rating
rating.backward()

# Take away the hook after use
hook.take away()

We may have a gradient for each aspect of the function maps. So, once more, the form is (512, 14, 14). Determine 7 visualises a few of these. You possibly can see some are likely to have larger values. Nonetheless, we’re not so involved with the person gradients. Once we create a Grad-CAM heatmap, we’ll use the common gradient of every function map.

grad_shape = np.form(gradients[0])
print(f"Form of gradients: {grad_shape}") # (512, 14, 14)

Determine 7: gradients of the rating w.r.t. to the weather of function maps within the final convolutional layer (supply: creator)

Lastly, earlier than we transfer on, it’s good apply to reset the mannequin’s gradients (line 2). That is significantly essential in the event you plan to run the code for a number of photos, as gradients will be amassed with every backwards go.

# Reset gradients
mannequin.zero_grad()

Creating Grad-CAM heatmaps

First, we discover the imply gradients for every function map. There will likely be 512 of those common gradients. Plotting a histogram of them, you’ll be able to see most are usually round 0. In different phrases, these don’t have a lot influence on the anticipated rating. There are a couple of that are likely to have a unfavorable influence and a optimistic influence. It’s these function maps we wish to give extra weight to.

# Step 1: combination the gradients
gradients_aggregated = np.imply(gradients[0], axis=(1, 2))

Determine 8: histogram of common gradients (supply: creator)

We mix all of the activations by doing element-wise summation (traces 2-4). Once we do that, we weight every function map by its common gradient (line 3). In the long run, we may have one 14×14 array.

# Step 2: weight the activations by the aggregated gradients and sum them up
weighted_activations = np.sum(activations[0] * 
                              gradients_aggregated[:, np.newaxis, np.newaxis], 
                              axis=0)

These weighted activations will include each optimistic and unfavorable pixels. We will contemplate the unfavorable pixels to be suppressing the anticipated rating. In different phrases, a rise within the worth of those areas tends to lower the rating. Since we’re solely within the optimistic contributions—areas that assist the category prediction—we apply a ReLU activation to the ultimate heatmap (line 2). You possibly can see the distinction within the heatmaps in Determine 9.

# Step 3: ReLU summed activations
relu_weighted_activations = np.most(weighted_activations, 0)

Determine 9: relu of weighted activations (supply: creator)

You possibly can see the heatmap in Determine 9 is sort of coarse. It could be extra helpful if it had the scale of the unique picture. For this reason the final step for creating Grad-CAM heatmaps is to upsample to the dimension of the enter picture (traces 2-4). On this case, now we have a 224×224 picture.

#Step 4: Upsample the heatmap to the unique picture measurement
upsampled_heatmap = cv2.resize(relu_weighted_activations, 
                               (img_tensor.measurement(3), img_tensor.measurement(2)), 
                               interpolation=cv2.INTER_LINEAR)

print(np.form(upsampled_heatmap))  # Must be (224, 224)

Determine 10 offers us our ultimate visualisation. We show the pattern picture (traces 5-7) subsequent to the heatmap (traces 10-15). For the latter, we create a transparent visualisation with the assistance of Canny Edge detection (line 10). This provides us an edge map (i.e. define) of the pattern picture. We will then overlay the heatmap on prime of this (line 14).

# Step 5: visualise the heatmap
fig, ax = plt.subplots(1, 2, figsize=(8, 8))

# Enter picture
resized_img = img.resize((224, 224))
ax[0].imshow(resized_img)
ax[0].axis("off")

# Edge map for the enter picture
edge_img = cv2.Canny(np.array(resized_img), 100, 200)
ax[1].imshow(255-edge_img, alpha=0.5, cmap='grey')

# Overlay the heatmap 
ax[1].imshow(upsampled_heatmap, alpha=0.5, cmap='coolwarm')
ax[1].axis("off")

our Grad-CAM heatmap, there may be some noise. Nonetheless, it seems the mannequin is counting on the tail fin and, to a lesser extent, the pectoral fin to make its predictions. It’s beginning to make sense why the mannequin labeled this shark as a hammerhead. Maybe each animals share these traits.

Determine 10: enter picture (left) and grad-cam heatmap overlay on an edge map (proper) (supply: creator)

For some additional investigation, we apply the identical course of however now utilizing an precise picture of a hammerhead. On this case, the mannequin seems to be counting on the identical options. This can be a bit regarding. Would we not count on the mannequin to make use of one of many shark’s defining options— the hammerhead? In the end, this will likely lead VGG16 to confuse various kinds of sharks.

Determine 11: an extra instance picture (supply: Wikimedia Commons) (license: CC BY 2.0)

With this instance, we see how Grad-CAM can spotlight potential flaws in our mannequin. We cannot solely get their predictions but additionally perceive how they made them. We will perceive if the options used will result in unexpected predictions down the road. This may probably save us a variety of time, cash and within the case of extra consequential purposes, lives!

If you wish to be taught extra about XAI for CV take a look at considered one of these articles. Or see this Free XAI for CV course.

I hope you loved this text! See the course page for extra XAI programs. You may as well discover me on Bluesky | Threads | YouTube | Medium

References

[1] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visible explanations from deep networks by way of gradient-based localization. In Proceedings of the IEEE worldwide convention on laptop imaginative and prescient, pages 618–626, 2017.

Source link

Revisiting Benchmarking of Tabular Reinforcement Learning Methods

An Introduction to Remote Model Context Protocol Servers

How to Access NASA’s Climate Data — And How It’s Powering the Fight Against Climate Change Pt. 1

Revisiting Benchmarking of Tabular Reinforcement Learning Methods

I Tried Buying a Car Through Amazon: Here Are the Pros, Cons

Amazon and eBay to pay ‘fair share’ for e-waste recycling

Artificial Intelligence Concerns & Predictions For 2025

Barbara Corcoran: Entrepreneurs Must ‘Embrace Change’

Most Popular

Trump Plans to Announce $100 Billion A.I. Initiative

Duolingo CEO Clarifies AI Stance After Backlash: Read Memo

How to Craft Marketing Campaigns That Reach Multiple Generations

Our Picks

Revisiting Benchmarking of Tabular Reinforcement Learning Methods

Is Your AI Whispering Secrets? How Scientists Are Teaching Chatbots to Forget Dangerous Tricks | by Andreas Maier | Jul, 2025

Qantas data breach to impact 6 million airline customers

Grad-CAM from Scratch with PyTorch Hooks

The speculation behind Grad-CAM

Grad-CAM from Scratch

Load pretrained mannequin from PyTorch

Ahead go with pattern picture

PyTorch hooks naming conventions

Activations with PyTorch ahead hook

Gradients with PyTorch backwards hooks

Creating Grad-CAM heatmaps

References

Related Posts