I’m playing with image gradients and have seen they want to compute an image gradient with respect the raw class score. I’ve seen, “ypred=model(batch_img) loss = ypred[range(len(labels)), labels].sum() loss.backward()” and I’ve seen this “ypred=model(batch_img) loss = ypred[range(len(labels)), labels] loss.backward(torch.ones(labels.shape[0]))” I know the first implementation is correct. But I just want to make sure the second implementation is equivalent. It seems like it’d be correct too. I get the same image gradients - but I could be lucky.

1 Like

Yes they are equivalent because, if no gradient is passed in to .backward when the output is a scalar, it is implicitly `Tensor(1.)`

, and backprop through sum will into the same shape as the final output before you reduced it, which recovers `torch.ones(labels.shape[0]))`

. Doing .sum() is slightly more efficient though because expand won’t materialize the entire ones tensor, it just creates a view. Or even better you could just do `loss.backward(tensor(1.).expand_as(labels.shape[0]))`

.