[solved] Backward error when using "expand"

Frazer · June 23, 2017, 10:48pm

I use “expand” to repeat the tensor of N x D to be N x D x H x W in the module, and the following is a simplified version of the code:

import torch
from torch.autograd import Variable
class TwoLayerNet(torch.nn.Module):
  def __init__(self, D_in, H, D_out, h, w):
    super(TwoLayerNet, self).__init__()
    self.linear1 = torch.nn.Linear(D_in, H)
    self.linear2 = torch.nn.Linear(H * h * w, D_out)
  def forward(self, x):
    h_relu = self.linear1(x).clamp(min=0)
    h_relu = torch.unsqueeze(torch.unsqueeze(h_relu, 2), 3) # -> N x H x 1 x 1
    h_expand = h_relu.expand([64, H, h, w]).contiguous().view(64, -1) # -> N x H x h x w
    y_pred = self.linear2(h_expand) # -> N x D_out
    return y_pred

# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out, h, w = 64, 1000, 100, 10, 6, 6

x = Variable(torch.randn(N, D_in), requires_grad=True)
y = Variable(torch.randn(N, D_out), requires_grad=False)

model = TwoLayerNet(D_in, H, D_out, h, w)

criterion = torch.nn.MSELoss(size_average=False)
optimizer = torch.optim.SGD(model.parameters(), lr=1e-4)
for t in range(500):
  y_pred = model(x)
  loss = criterion(y_pred, y)
  print(t, loss.data[0])
  optimizer.zero_grad()
  loss.backward()
  optimizer.step()

The forward is OK and output: (0, 667.63525390625)

But I get the error:
Traceback (most recent call last):
File “script_test.py”, line 36, in
loss.backward()
File “/usr/local/lib/python2.7/dist-packages/torch/autograd/variable.py”, line 151, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, retain_variables)
File “/usr/local/lib/python2.7/dist-packages/torch/autograd/init.py”, line 98, in backward
variables, grad_variables, retain_graph)
File “/usr/local/lib/python2.7/dist-packages/torch/autograd/function.py”, line 90, in apply
return self._forward_cls.backward(self, *args)
File “/usr/local/lib/python2.7/dist-packages/torch/autograd/_functions/pointwise.py”, line 286, in backward
return grad_output * mask, None
File “/usr/local/lib/python2.7/dist-packages/torch/autograd/variable.py”, line 789, in mul
return self.mul(other)
File “/usr/local/lib/python2.7/dist-packages/torch/autograd/variable.py”, line 310, in mul
return Mul.apply(self, other)
File “/usr/local/lib/python2.7/dist-packages/torch/autograd/_functions/basic_ops.py”, line 50, in forward
return a.mul(b)
RuntimeError: inconsistent tensor size at ~/pytorch/torch/lib/TH/generic/THTensorMath.c:875

Can anyone help me to figure out the problem ?

apaszke · June 25, 2017, 11:00pm

What PyTorch version are you running?

Frazer · June 25, 2017, 11:05pm

I compiled from the master branch recently. Can you run the code without error ?

Qizhe_Xie · June 26, 2017, 1:17am

I encountered a similar error where the forward pass was OK and backward failed. I compiled the latest version and can reproduce your error. My traceback is as follows, which is a little different from yours:

Traceback (most recent call last):
File “try.py”, line 31, in
loss.backward()
File “/home/qizhe/tool/anaconda2/lib/python2.7/site-packages/torch/autograd/variable.py”, line 152, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, retain_variables)
File “/home/qizhe/tool/anaconda2/lib/python2.7/site-packages/torch/autograd/init.py”, line 98, in backward
variables, grad_variables, retain_graph)
File “/home/qizhe/tool/anaconda2/lib/python2.7/site-packages/torch/autograd/function.py”, line 91, in apply
return self._forward_cls.backward(self, *args)
File “/home/qizhe/tool/anaconda2/lib/python2.7/site-packages/torch/autograd/_functions/pointwise.py”, line 289, in backward
return grad_output * mask, None
File “/home/qizhe/tool/anaconda2/lib/python2.7/site-packages/torch/autograd/variable.py”, line 802, in mul
return self.mul(other)
File “/home/qizhe/tool/anaconda2/lib/python2.7/site-packages/torch/autograd/variable.py”, line 311, in mul
return Mul.apply(self, other)
File "/home/qizhe/tool/anaconda2/lib/python2.7/site-packages/torch/autograd/functions/basic_ops.py", line 48, in forward
return a.mul(b)
RuntimeError: inconsistent tensor size, expected r [1 x 100 x 6 x 6], t [1 x 100 x 6 x 6] and src [64 x 100] to have the same number of elements, but got 3600, 3600 and 6400 elements respectively at /home/qizhe/tool/pytorch/torch/lib/TH/generic/THTensorMath.c:875

Frazer · June 26, 2017, 1:32am

@Qizhe_Xie, thanks. Yours seems have added more error message, but I cannot understand why it shows that. My confusion is can any version successfully run the code.

Qizhe_Xie · June 26, 2017, 1:35am

Me neither. The stable version torch-0.1.12.post2-cp27-n also gives the same error.

Qizhe_Xie · June 26, 2017, 2:01am

I have figured out the error… The size arguments for tensor.expand shouldn’t be passed within a list, i.e., you should use h_expand = h_relu.expand(64, H, h, w).contiguous().view(64, -1) instead of h_expand = h_relu.expand([64, H, h, w]).contiguous().view(64, -1).

It’s strange that forward doesn’t throw an exception…

Frazer · June 26, 2017, 2:12am

Oh yes… The forward did work through. Anyway, thanks a lot @Qizhe_Xie.