Gradient of FC with respect to each sample in batch

I want to get gradient of FC layer of network with respect to each sample. FC is last linear (dense) layer.

With batch size of 1, I can use following code

loss_CE = torch.nn.CrossEntropyLoss().cuda()

for i, (x,y) in enumerate(train_loader, 0):
  x = x.cuda()
  inputs = Variable(x, requires_grad = True)
  FV, Logit  = model(inputs)
  FV = Variable(FV, requires_grad = True)
  m, y_hat = torch.max(Logit, dim = 1)
  loss = loss_CE(Logit,y_hat)
  grad = model.fc.weight.grad

For ImagNet and Resnet-50 it produce a tensor of 1000 x 2048

It take a lot of time if I run on all images with batch size of 1.

If I increase the batch size, output of above code is still 1000 x 2048.

How can I modify to work in batch? output tensor size should be 256 x 1000 x 2048 when batch size is 256

autograd automatically condenses the gradient to the proper size for the weight regardless of the batch size. If you just want to run your network with batches, you don’t need to modify the existing code as autograd will do that for you.

If, instead, you want manual access to the gradients, you may want to look at backward hooks: torch.nn.modules.module.register_module_backward_hook — PyTorch master documentation