If I understand properly, there is no way how to get gradients for every single sample in minibatch in PyTorch. But is there an option to compute it more efficiently than running it in the loop and compute each gradient separately?

for im,lab in zip(images,labels):
optimizer.zero_grad()
output = model(im)
loss = criterion(output, lab)
loss.backward()
grad_new.append(copy.deepcopy([x.grad for x in model.parameters()]))
param_new.append(copy.deepcopy([x.data for x in model.parameters()]))

Hi, thank you for your response. I am trying to construct an adaptive importance sampling strategy and there is an intuition that this should be connected with the change in the norm of the gradient (L-smoothness), so I need this quantity in order to update probabilities.

Interesting. The main problem with this is that many operations like conv/linear actually do the accumulation during their backward pass: the matrix multiplication that is done during the backward of a linear is doing the accumulation. This means that you would need to rewrite these Modules to handle backward without accumulation. Or forward from a batch of weights (which they don’t at the moment).

You can do forward with batch, but don’t average/accumulate the loss.
Computationally you still need to backward each sample separately there is no way to get around that.