Autograd with non-scalar

At the moment if I want to view the gradients of a specific layer w.r.t each one of my inputs in the batch I loop through the gradient_weights in the loss.backward() method like so:

import torch
import torchvision
x = torch.rand((2,3,224,224))
y = torch.ones(2, dtype=torch.long)
m = torchvision.models.resnet18()
criterion_vec = torch.nn.CrossEntropyLoss(reduction='none')
optimizer = torch.optim.SGD(m.parameters(), 0.001)
out = m(x)

b = out.shape[0]
grads = []
for i in range(b):
    idx = torch.zeros(b)
    idx[i] = 1

    loss = criterion_vec(out, y)
    loss.backward(torch.FloatTensor(idx), retain_graph=True)
    g = m.conv1.weight.grad[0][0][0]

I was wondering if there was an easier way to get the gradients w.r.t each one of the inputs in the batch. Or is this reduction on the batch-axis forced by cuDNN?

Yes, unfortunately, the cudnn ops always do the reduction.

You can check libraries like that implement custom backwards to replace these ops and give per-sample gradients efficiently.

1 Like

Thanks very much! I tried it and it matches

model.conv1.weight.grad1.shape  #torch.Size([4, 20, 1, 5, 5])
mean_grads = model.conv1.weight.grad1.mean(axis=0)
pytorch_grads = model.conv1.weight.grad