Gradient of the output wrt activation

halahup · February 18, 2019, 4:49am

Hi, I am trying to implement parts of the Class Activation Map algorithm that requires computing the gradients of the output logit with respect to the last convolutional activation; I have come across some issues and I don’t think I understand how to do it.

I understand that I need to register a hook since the activations are intermediate variables, but my registered hook doesn’t seem to be triggered on backward. Here is a simple code snippet with the VGG19 network:

def hook(grad):
    print("I am called")
    print(grad)

class VGG(nn.Module):
    def __init__(self):
        super(VGG, self).__init__()
        
        # get the pretrained VGG19 network
        self.vgg = vgg19(pretrained=True)
        
        # disect the network to access its last convolutional layer
        self.features_conv = self.vgg.features[:35]
        
        # get the relu and the max pool of the features stem
        self.relu_max_pool_features = self.vgg.features[35:37]
        
        # get the classifier of the vgg19
        self.classifier = self.vgg.classifier
        
    def forward(self, x):
        x = self.features_conv(x)
        x = self.relu_max_pool_features(x)
        x = x.view((1, -1))
        x = self.classifier(x)
        return x
    
    def get_activations(self, x):
        return self.features_conv(x)

# - - - - -
vgg = VGG()
img, label = next(iter(dataloader))
pred = vgg(img)

# get the activations of the last conv layer in the features stem
activations = vgg.get_activations(img)

# register the hook
activations.register_hook(hook)

# calculate the gradients of the logit wrt to all the parameters
pred[:, 805].backward()

No gradient is then saved neither in .grad (which is understandable); however, the hook is never triggered either.

Also, if there is an easier way to get to the activations of pretrained networks, I would love to learn it. Any help is appreciated.

halahup · February 20, 2019, 2:37am

Ok, I have been able to solve the problem. It seems that the way my model class is written when I call get_activations() a new tensor is returned, hence the backward() doesn’t compute the gradient of the outputs with respect to the activations. I fix it by attaching a hook to the activation tensor within the forward method like this:

class VGG(nn.Module):
    def __init__(self):
        super(VGG, self).__init__()
        
        # get the pretrained VGG19 network
        self.vgg = vgg19(pretrained=True)
        
        # disect the network to access its last convolutional layer
        self.features_conv = self.vgg.features[:35]
        
        # get the relu and the max pool of the features stem
        self.relu_max_pool_features = self.vgg.features[35:37]
        
        # get the classifier of the vgg19
        self.classifier = self.vgg.classifier
        
        # placeholder for the gradients
        self.gradients = None
        
    def activations_hook(self, grad):
        self.gradients = grad
        
    def forward(self, x):
        x = self.features_conv(x)
        
        # register the hook
        x.register_hook(self.activations_hook)
        
        x = self.relu_max_pool_features(x)
        x = x.view((1, -1))
        x = self.classifier(x)
        return x
    
    def get_activations_gradient(self):
        return self.gradients

This way the gradient is returned as expected when called with vgg.get_activations_gradient().