Function TBackward returned an invalid gradient at index 0 - got [1, 3] but expected shape compatible with [1, 2]

I’m working on a neural network that can expand in size from one iteration to the next according to changes in the inputs. While developing the training loop, I encountered a problem during back-propagation. I coded a minimal example that reproduces the problem:

import torch
from torch import nn

l1 = nn.Linear(2, 1)

x = torch.tensor([1.0, 3.0])

# with torch.no_grad():
y_pred = l1(x)

x = torch.tensor([1.0, 3.0, -2.0])
y = torch.tensor([2.0]) =
    [, torch.randn((1, 1))],
l1.in_features += 1

y_pred = l1(x)
loss = (y - y_pred)**2


This gives the following error:

Traceback (most recent call last):
  File "/Users/pedrocoelho/gnn-poc/", line 25, in <module>
  File "/usr/local/lib/python3.9/site-packages/torch/", line 255, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
  File "/usr/local/lib/python3.9/site-packages/torch/autograd/", line 147, in backward
RuntimeError: Function TBackward returned an invalid gradient at index 0 - got [1, 3] but expected shape compatible with [1, 2]

What could be causing this? I know it has to do with the incompatibility of shape of the gradients, but I find that odd since the gradients are not calculated until loss.backward() is called. It might also be related to the intermediate results being stored in the context of a forward pass. How do I access/clear those intermediate results after each forward+backward iteration? Also, what exactly is the TBackward function? I couldn’t find much information about it.

Calling the first y_pred = l1(x) inside the with torch.no_grad() fixes the problem but it isn’t an option because I need want to do a backward pass at every iteration, with different network sizes.

The computation graph for the loss looks like this:

You are currently manipulating the .data attribute, which is not supported and can yield unwanted side effects such as probably this one.
If you want to replace the .weight parameter of the linear layer, wrap the assignment in a with torch.no_grad() block and assign a new nn.Parameter to it.

1 Like

Thank you,
As you said, the following way of expanding the weights works:

with torch.no_grad():
    l1.weight = nn.Parameter(
                torch.randn((1, 1))