Pytorch not computing the right gradient for division

I’m trying to normalize an array(tensor) so that they sum to 1.

I did the following computation, however, the output gradient seems to be wrong. ( why are they all the same?? they should be different with respect to each element in a

code:

a = torch.rand(10, requires_grad=True)
print(a)
x = torch.sum(a)
y = a / x
y.backward(torch.ones_like(a))
pp(a.grad)

output:

tensor([0.6022, 0.6065, 0.0329, 0.3639, 0.2053, 0.1006, 0.6993, 0.1843, 0.0400,
        0.2939], requires_grad=True)
tensor([-5.9605e-08, -5.9605e-08, -5.9605e-08, -5.9605e-08, -5.9605e-08,
        -5.9605e-08, -5.9605e-08, -5.9605e-08, -5.9605e-08, -5.9605e-08])

Why do you expect them to be all different? You backward a Tensor full of 1s.

thanks for the quick reply!

Let’s say a = (a1, a2, a3)

then y = (y1, y2, y3) = (a1, a2, a3) / (a1 + a2 + a3).
then d(y1) / d(a1) = (a2 + a3) / (a1 + a2 + a3) ^ 2

and similarly, d(y2) / d(a2) = (a1 + a3) / (a1 + a2 + a3) ^ 2

So they should be different rather than same?

Yes but this is not what you get here.
Since you backward a Tensor full of 1, you get for the grad of a1: 1*d(y1)/d(a1) + 1*d(y3)/d(a1) + 1*d(y3)/d(a1).
And I guess the same term will appear if you do the same for a2 and a3 hence the constant result that you see.

I see, PyTorch is computing the jacobian matrix… my bad my bad.

Thanks for the help!

1 Like