Hi, I am trying to acquire the gradient in the last hidden layer for batch inputs.
Let says the last layer is a linear layer (512*100) and 100 is the number of classes.
For each sample, the gradient would be 100*512
Currently, my code is as follows.
losses = self.criterion_without_reduction(outputs, targets)
gradient_batch = []
for loss in losses:
loss.backward(retain_graph=True)
gradient_list = reversed(list(model.net.named_parameters()))
# only output the gradient for the last layer
for name, parameter in gradient_list:
if 'weight' in name:
gradient = parameter.grad.clone() # [column[:, None], row].resize_(100,100)
gradient = gradient.unsqueeze_(0)
gradient_batch.append(gradient.unsqueeze_(0))
break
gradient_batch = torch.cat(gradient_batch, dim=0)
The above code will generate a tensor ([batch_size, 1, 100, 512])
However, the code runs relatively slow and I am wondering whether there is a more efficient way to do it?
You could access the gradient directly: model.gradient = model.net.fc.weight.grad.clone() (replace fc with the name of your classification layer) The speed is probably an issue with the loss.backward() stage, but this step is probably not the bottleneck.
Ideally you should be running it after model.train(), but in this case there shouldn’t be any problem. I suppose the model.net.fc call should exist, if it does, I suspect the backward() call isn’t affecting the fc layer. Since your first code does work, I suspect you’re trying to access the classification module with the wrong name(fc in this case).