RuntimeError calling backward

I’m getting the following error. I have no concrete idea what might be triggering it. Any suggestions on what to look for?

    loss.backward()
  File "anaconda2/lib/python2.7/site-packages/torch/autograd/variable.py", line 146, in backward
    self._execution_engine.run_backward((self,), (gradient,), retain_variables)
  File "anaconda2/lib/python2.7/site-packages/torch/autograd/_functions/tensor.py", line 22, in backward
    grad_input._set_index(self.index, grad_output)
RuntimeError: tensor must have one dimension at /py/conda-bld/pytorch_1493669264383/work/torch/lib/TH/generic/THTensor.c:814```

is grad_output a zero-dimensional tensor for some reason?

No. It seems to have the right dimensions. I will try to explain in words what I’m doing and then try to generate a minimal example. I am working on a minimal example, but I couldn’t get it to fail in the exact same way yet.

I’m learning a model for sequence tagging that predicts tags sequentially. The model uses word embeddings, and tag embeddings for the previous predictions. The main aspect is that the scores used for prediction are computed cumulatively. The training losses are computed based on these scores. The scores for the first prediction and second prediction are $s _1 = s( x, \emptyset ; w) $ and $s _2 = s( x, \hat y _1 ; w) + s _1$, respectively. $x$ is the sentence and $\hat y _1$ is the prediction induced by $s _1$. The loss $\ell$ is computed based on the scores (and some gold data that I’m omitting because it is not important here). Doing backward on $\ell(s _1)$ works, but backward on $\ell(s _2)$ fails. I do a sequence of backward calls and take a step only at the end of the sequence.

I tried various things, but still got the same error: updating with each predicted tag; doing retain_variables=True; accumulating all losses and doing backward only at the end. I suppose that this is due to me not fully understanding what you can and cannot do with these computational graphs. What is the best way of training with cumulative scores on PyTorch?

I’m running on the CPU and using version 0.1.12_2.

Thanks a lot for looking into this.

The minimal example that I came up fails in a different way.

import torch as torch
import torch.nn as nn
import torch.nn.functional as F
from torch.autograd import Variable
    
(n, h, m) = (3, 4, 5)
fc1 = nn.Linear(n, h)
fc2 = nn.Linear(h, m)
emb = nn.Embedding(3, n)

f = lambda z, i: fc2( F.relu( fc1( emb( z[i:i + 2] ).sum(0).view(1, -1) ) ) )

x = torch.LongTensor([0, 1, 2])
y1 = torch.LongTensor([0])
y2 = torch.LongTensor([1])

y1_p = f( Variable( x ), 0 )
l1 = nn.CrossEntropyLoss()(y1_p, Variable( y1 ) )
l1.backward()

y2_p = y1_p + f(Variable( x ), 1)
l2 = nn.CrossEntropyLoss()(y2_p, Variable( y2 ) )
l2.backward()

This yields RuntimeError: Trying to backward through the graph second time, but the buffers have already been freed. Please specify retain_variables=True when calling backward for the first time., which is solved by doing l1.backward(retain_variables=True) rather than l1.backward().

this error is totally expected. After you do l1.backward(), the graph is freed and you cannot do further operations on parts of the graph (it’s already freed/deleted).

Here’s correct code:

import torch as torch
import torch.nn as nn
import torch.nn.functional as F
from torch.autograd import Variable

(n, h, m) = (3, 4, 5)
fc1 = nn.Linear(n, h)
fc2 = nn.Linear(h, m)
emb = nn.Embedding(3, n)

f = lambda z, i: fc2( F.relu( fc1( emb( z[i:i + 2] ).sum(0).view(1, -1) ) ) )

x = torch.LongTensor([0, 1, 2])
y1 = torch.LongTensor([0])
y2 = torch.LongTensor([1])

y1_p = f( Variable( x ), 0 )
l1 = nn.CrossEntropyLoss()(y1_p, Variable( y1 ) )

y2_p = y1_p + f(Variable( x ), 1)
l2 = nn.CrossEntropyLoss()(y2_p, Variable( y2 ) )
(l1+l2).backward()
# alternatively
# torch.autograd.backward([l1, l2], [l1.data.new([1]), l2.data.new([1])])
1 Like

Thanks. I did not work. I tried that before, and tried it again now, and still got the same error: RuntimeError: tensor must have one dimension at /py/conda-bld/pytorch_1493669264383/work/torch/lib/TH/generic/THTensor.c:814. I still could not get a minimal example to reproduce this error. The dimensions all seem to agree and the score computations work until the end of the sequence. When calling backward, it fails though. Any tips? It must be a different issue.

i think this was a bug that is now fixed in master and will be in the next release a week to two weeks away.

If you want an immediate fix, do consider building from master: https://github.com/pytorch/pytorch#from-source

We really apologize for having to make you build from source :frowning: for the short-term fix.

p.s.: I ran the script (my corrected version) and it works on master.

Thanks for all the help. I just installed from source. I still have the same issue. I will keep looking. Any suggestions are appreciated.

  File "anaconda2/lib/python2.7/site-packages/torch/autograd/variable.py", line 151, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, retain_variables)
  File "anaconda2/lib/python2.7/site-packages/torch/autograd/__init__.py", line 98, in backward
    variables, grad_variables, retain_graph)
  File "anaconda2/lib/python2.7/site-packages/torch/autograd/function.py", line 90, in apply
    return self._forward_cls.backward(self, *args)
  File "anaconda2/lib/python2.7/site-packages/torch/autograd/function.py", line 183, in wrapper
    outputs = fn(ctx, *tensor_args)
  File "anaconda2/lib/python2.7/site-packages/torch/autograd/_functions/tensor.py", line 276, in backward
    grad_tensor.index_add_(ctx.dim, index, grad_output)
RuntimeError: tensor must have one dimension at pytorch/torch/lib/TH/generic/THTensor.c:814```

Version: 0.1.12+4eb448a ; installed for CPU, i.e., no GPU support.

For some reason grad_output has two dimensions, i.e., len(grad_output.size()) is two. Any ideas on why this is the case? The loss Variable on which I’m calling backward has a single dimensions with one element.

okay so then the repro script you gave me is not the same as the script that you are running. I understand if it’s part of a larger code-base that you cant share. Step through in pdb and see what the size of grad_output is at tensor.py:276 as shown in the stack trace.

Yeah, the research code base that I’m working on is larger and it is hard to extract a single example. The minimal example tried to capture my understanding of what might have been wrong. I discovered a few bugs in my code related to dimensions. It was a combination of bugs that led to that issue. One of the most striking ones was using sum along an axis followed by repeat; I forgot to account for the fact that the axis summed over disappears. Most errors were in parts of the code that I wrote some reasonable amount of time ago and didn’t think much of. I figured things out with a combination of line debugging and print statements. The first error message that I go was definitely opaque. Thanks for all the help.