I’m playing with image gradients and have seen they want to compute an image gradient with respect the raw class score. I’ve seen, “ypred=model(batch_img) loss = ypred[range(len(labels)), labels].sum() loss.backward()” and I’ve seen this “ypred=model(batch_img) loss = ypred[range(len(labels)), labels] loss.backward(torch.ones(labels.shape))” I know the first implementation is correct. But I just want to make sure the second implementation is equivalent. It seems like it’d be correct too. I get the same image gradients - but I could be lucky.
Yes they are equivalent because, if no gradient is passed in to .backward when the output is a scalar, it is implicitly
Tensor(1.), and backprop through sum will into the same shape as the final output before you reduced it, which recovers
torch.ones(labels.shape)). Doing .sum() is slightly more efficient though because expand won’t materialize the entire ones tensor, it just creates a view. Or even better you could just do