How to get loss gradient w.r.t. post-convolution feature map in CNNs


I’m trying to measure the gradient of a loss w.r.t. a specific set of feature maps of a CNN. This feature map would be an array. Is there a way to return this gradient through a function like backward()?

I was thinking of creating an attribute in the cnn model, containing the values of the feature maps, and then using the backward operation on the loss :

output = modelCNN.forward(input)
loss = loss_custom(output, target)
feature_map = modelCNN.feature_map
gradient_loss_wrt_featuremap = loss.backward(feature_map)

Could this work? The problem is that in the documentation of PyTorch, .backward() doesn’t have any return argument.

Any help is appreciated, thanks!


backward takes as input the gradients to flow back. NOT what you want to differentiate.

You can do what you want in two ways (assuming feature_map has requires_grad=True):

  • call .retain_grad() on the feature map. Then when you call loss.backward(), your feature map will have a .grad attribute containing it’s gradients wrt the loss.
  • Use the autograd.grad() for which you can specify which inputs you want gradients for and it will return them to you.