Torch.cat bug: Forward is fine, but backwards fails

evzliu · July 20, 2017, 11:30pm

I’m running into the following issue. When I use torch.cat, the forward pass of my model is fine, while the backward pass yields the error posted below. If I instead hack around this and use torch.stack to achieve the same goal, the code functions properly in both the forward and backward passes.

def forward():
    ...
    # text_embeddings is a Variable(FloatTensor): batch x embed_dim
    # other_embeddings is a Variable(FloatTensor): batch x embed_dim

    # Using torch.cat yields the RuntimeError below.
    concatenated = torch.cat((text_embeddings, other_embeddings), dim=1)

    # Using the torch.stack operation to do the same thing works fine.
    # concatenated = torch.stack((text_embeddings, other_embeddings), dim=2)
    # concatenated = concatenated.view(-1, 2 * embed_dim)
    return concatenated

Error:

  Traceback (most recent call last):
  File "main.py", line 60, in <module>
    run.train()
  File "code.py", line 99, in train
    self._take_grad_step(train_state, loss)
  File "code.py", line 51, in _take_grad_step
    loss.backward()
  File "/usr/local/lib/python2.7/site-packages/torch/autograd/variable.py", line 146, in backward
    self._execution_engine.run_backward((self,), (gradient,), retain_variables)
  File "/usr/local/lib/python2.7/site-packages/torch/autograd/_functions/tensor.py", line 314, in backward
    in zip(self.input_sizes, _accumulate(self.input_sizes)))
  File "/usr/local/lib/python2.7/site-packages/torch/autograd/_functions/tensor.py", line 313, in <genexpr>
    return tuple(grad_output.narrow(self.dim, end - size, size) for size, end
RuntimeError: out of range at /Users/soumith/code/builder/wheel/pytorch-src/torch/lib/TH/generic/THTensor.c:386

I’ve tried to create a minimal example that fails, but unfortunately, I have not been able to successfully find a small example that demonstrates this bug. I am running on torch version 0.1.12.post2.

evzliu · July 25, 2017, 9:48pm

Posted with minimal example here: https://github.com/pytorch/pytorch/issues/2208