Effective computation of single gradients for minibatches

Samuel_Horvath · February 4, 2019, 8:52am

If I understand properly, there is no way how to get gradients for every single sample in minibatch in PyTorch. But is there an option to compute it more efficiently than running it in the loop and compute each gradient separately?

for im,lab in zip(images,labels):
            optimizer.zero_grad()
            output = model(im)
            loss = criterion(output, lab)
            loss.backward()
            grad_new.append(copy.deepcopy([x.grad for x in model.parameters()]))
            param_new.append(copy.deepcopy([x.data for x in model.parameters()]))

albanD · February 5, 2019, 9:50am

Hi,

I’m afraid this is the only thing you can do at the moment.

I was thinking this might be a feature we want to add.
Could you describe your use case please? To make sure we do something that match people’s need?

Samuel_Horvath · February 5, 2019, 4:31pm

Hi, thank you for your response. I am trying to construct an adaptive importance sampling strategy and there is an intuition that this should be connected with the change in the norm of the gradient (L-smoothness), so I need this quantity in order to update probabilities.

albanD · February 8, 2019, 11:19am

Hi,

Interesting. The main problem with this is that many operations like conv/linear actually do the accumulation during their backward pass: the matrix multiplication that is done during the backward of a linear is doing the accumulation. This means that you would need to rewrite these Modules to handle backward without accumulation. Or forward from a batch of weights (which they don’t at the moment).

lugiavn · February 8, 2019, 4:41pm

You can do forward with batch, but don’t average/accumulate the loss.
Computationally you still need to backward each sample separately there is no way to get around that.