from torch.autograd import Variable
import torch
x = Variable(torch.Tensor([2]), requires_grad = True)
y = x + 2
x1 = Variable(torch.Tensor([3]), requires_grad = True)
x2 = Variable(torch.Tensor([4]), volatile = True)
y.data = torch.cat((y.data , x1.data, x2.data), 0) #y.data = torch.cat((y.data , x1.data, x2.data), 0).contiguous()
y.data.resize_(5)
z = y * y *3
out = z.mean()
out.backward()
print(x, x.grad, x1.grad)

Output :
Variable containing:
2
[torch.FloatTensor of size 1]
Variable containing:
4.8000e+00
3.6000e+00
4.8000e+00
5.5023e-41
5.3249e-44
[torch.FloatTensor of size 5]

Ideally, x.grad (size 5) should be of the same size as x (size 1 ). if x is a non-leaf variable this raises an error (backward() raises an error). Any workaround for using torch.cat

Firstly, x2 is volatile, this means that when pytorch sees x2 used in any calculation it does not store the computation graph. If any of your modelâ€™s inputs is volatile, then pytorch wonâ€™t be able to backpropagate.

Secondly, you can use torch.cat on Variables directly. If you operate on .data then pytorch doesnâ€™t track the operations and canâ€™t backpropagate properly.

y = torch.cat((y, x1, x2), 0)

Another potential problem could be the use of .resize_() which you can replace with a simple slice.

y = y[:5]

Normally x and x.grad must contain the same number of elements, but assigning the result of torch.cat on y.data confuses the autograd mechanism because you are changing the size of the underlying tensor without informing the computation graph of the change. It works fine with the above modifications.

The basic rule of backprop is to never use .data if you want to backpropagate. Donâ€™t use volatile=True either, unless you are running in inference mode.