I am trying to run out.backward(torch.Tensor([2.0])) and I get a error shape mismatch, (which makes sense) but I am try to run the same in Pytorch 1.0.2 it is working if I print grad post this operation elements in the matrix get multiplied by 2.0.
out.backward(torch.tensor([2.0])) throws error RuntimeError: Mismatch in shape: grad_output[0] has a shape of torch.Size([1]) and output[0] has a shape of torch.Size([]).
Infact, out.backward(torch.tensor([1.0])) doesn’t work as well in the latest version.
0 dimensional tensors have been introduced to represent tensors with no dimension and a single input value.
You can use torch.tensor(2.0) to get a 0 dimensional Tensor that contains the value 2.
Thanks @albanD, it works now but I get different output for x.grad if I use
Output 1: (out.backward(torch.tensor([2.0])) in pytorch version 1.2)
A 2x2 square matrix where each values is 6
Output 2: (out.backward(torch.tensor(2.0)) in pytorch version 1.3)
A 2x2 square matrix where each values is 9
May you please explain the difference in results?
Values of x,y and z are as follows: x = torch.ones(2, 2) x.requires_grad_(True) y = x+2 z = y * y * 3 out = z.mean() out.backward(torch.tensor(2.0)) print(x.grad)
f(x) = ( sum_i (xi+2)^2 * 3 ) / size(x) = (sum_i (xi+2)^2 * 3 ) / 4
df(x)/dxj = 2 * (xj + 2) *3 / 4
given that you start with xi = 1 for all i.
the gradients you expect with no scaling is: 2 * (1 + 2) *3 / 4 = 4.5
So if you multiply the result by 2, you get 9 as expected.
I am not sure why you get 6 for your pytorch 1.2 code, but I would say this is most likely a small bug in your test code? do your start with torch.zeros() for example instead of torch.ones()?
Sorry @albanD it was mistake from my end. That should return 9.0 as expected, I am using different function for whose derivative is 3.0 and hence when multiplied by 2.0 = 6.0