Backward() on torch.cat() not working as expected

Pickleriick · March 20, 2018, 6:31am

from torch.autograd import Variable
import torch
x = Variable(torch.Tensor([2]), requires_grad = True)
y = x + 2
x1 = Variable(torch.Tensor([3]), requires_grad = True)
x2 = Variable(torch.Tensor([4]), volatile = True)
y.data = torch.cat((y.data , x1.data, x2.data), 0)
#y.data = torch.cat((y.data , x1.data, x2.data), 0).contiguous()
y.data.resize_(5)
z = y * y *3
out = z.mean()
out.backward()
print(x, x.grad, x1.grad)

Output :
Variable containing:
2
[torch.FloatTensor of size 1]
Variable containing:
4.8000e+00
3.6000e+00
4.8000e+00
5.5023e-41
5.3249e-44
[torch.FloatTensor of size 5]

Ideally, x.grad (size 5) should be of the same size as x (size 1 ). if x is a non-leaf variable this raises an error (backward() raises an error). Any workaround for using torch.cat

jpeg729 · March 20, 2018, 9:03am

Firstly, x2 is volatile, this means that when pytorch sees x2 used in any calculation it does not store the computation graph. If any of your model’s inputs is volatile, then pytorch won’t be able to backpropagate.

Secondly, you can use torch.cat on Variables directly. If you operate on .data then pytorch doesn’t track the operations and can’t backpropagate properly.

y = torch.cat((y, x1, x2), 0)

Another potential problem could be the use of .resize_() which you can replace with a simple slice.

y = y[:5]

Normally x and x.grad must contain the same number of elements, but assigning the result of torch.cat on y.data confuses the autograd mechanism because you are changing the size of the underlying tensor without informing the computation graph of the change. It works fine with the above modifications.

The basic rule of backprop is to never use .data if you want to backpropagate. Don’t use volatile=True either, unless you are running in inference mode.