out.backward(torch.Tensor([2.0])) doesn't work in Pytorch 1.0.3 but works in Pytorch 1.0.2

rwamit · October 15, 2019, 1:44pm

Hi Guys,

I am trying to run out.backward(torch.Tensor([2.0])) and I get a error shape mismatch, (which makes sense) but I am try to run the same in Pytorch 1.0.2 it is working if I print grad post this operation elements in the matrix get multiplied by 2.0.

out.backward(torch.tensor([2.0])) throws error
RuntimeError: Mismatch in shape: grad_output[0] has a shape of torch.Size([1]) and output[0] has a shape of torch.Size([]).

Infact, out.backward(torch.tensor([1.0])) doesn’t work as well in the latest version.

albanD · October 15, 2019, 3:35pm

0 dimensional tensors have been introduced to represent tensors with no dimension and a single input value.
You can use torch.tensor(2.0) to get a 0 dimensional Tensor that contains the value 2.

rwamit · October 15, 2019, 5:02pm

Thanks @albanD, it works now but I get different output for x.grad if I use
Output 1: (out.backward(torch.tensor([2.0])) in pytorch version 1.2)
A 2x2 square matrix where each values is 6

Output 2: (out.backward(torch.tensor(2.0)) in pytorch version 1.3)
A 2x2 square matrix where each values is 9

May you please explain the difference in results?

Values of x,y and z are as follows:
x = torch.ones(2, 2)
x.requires_grad_(True)
y = x+2
z = y * y * 3
out = z.mean()
out.backward(torch.tensor(2.0))
print(x.grad)

OUTPUT - tensor([[9., 9.],
[9., 9.]])
or

tensor([[6., 6.],
[6., 6.]])

albanD · October 15, 2019, 7:19pm

Your full function here is:

f(x) = ( sum_i (xi+2)^2 * 3 ) / size(x) = (sum_i (xi+2)^2 * 3 ) / 4
df(x)/dxj = 2 * (xj + 2) *3 / 4
given that you start with xi = 1 for all i.
the gradients you expect with no scaling is: 2 * (1 + 2) *3 / 4 = 4.5
So if you multiply the result by 2, you get 9 as expected.

I am not sure why you get 6 for your pytorch 1.2 code, but I would say this is most likely a small bug in your test code? do your start with torch.zeros() for example instead of torch.ones()?

rwamit · October 15, 2019, 8:15pm

Sorry @albanD it was mistake from my end. That should return 9.0 as expected, I am using different function for whose derivative is 3.0 and hence when multiplied by 2.0 = 6.0

Thanks again for your help!