I’ve stumbled on a RuntimeError during the backward pass of my model. It looks like a bug in autograd, since the forward pass works alright and I did not override any .backward(). Here are the relevant bits of the stack trace:
Traceback (most recent call last):
...
File "/home/carl/projets/smallnet/trainable.py", line 182, in train_epoch
loss.backward()
File "/home/carl/vpriv/lib/python3.5/site-packages/torch/autograd/variable.py", line 156, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, retain_variables)
File "/home/carl/vpriv/lib/python3.5/site-packages/torch/autograd/__init__.py", line 98, in backward
variables, grad_variables, retain_graph)
RuntimeError: invalid argument 3: sizes do not match at /pytorch/torch/lib/THC/generated/../generic/THCTensorMathPointwise.cu:217
Are you aware of common pitfalls that can trigger that kind of error?
It seems I cannot debug the backward pass; it’s computed with the C library.
This kind of errors during the backward pass comes from the fact that you modified a Tensor inplace during the forward pass, and this tensor was necessary for backpropagation. As stated by @SpandanMadan we need more context to help you in this case.
My code is quite spread out in numerous files, so I could not include it; also it was hard to do a minimal reproducible example.
I’m using version 0.2.0.
Thank you for suggesting this. It helped me track down the following operation. Here, I just want to modify a parameter inside a module:
self.my_param.data = other_tensor # where my_param is a nn.Parameter
# other_tensor has a different size than self.my_param.data
Then I do a forward and a backward pass, without any problem. Then, I modify the parameter again. I do a forward pass, then a backward pass, and there I have the RuntimeError. The reason is that .grad is allocated on the first backward pass. The second time I change .data, the size of .data becomes different from the size of .grad.
Thus the error: RuntimeError: invalid argument 3: sizes do not match
My solution to this was to replace the parameter completely instead of assigning to .data:
self.my_param = nn.Parameter(other_tensor)
Now everything works.
Should this be prevented?
If we want to prevent this from happening, there are many options.
Should it be forbidden to assign a new tensor to my_param.data ?
Should it be forbidden to assign a tensor of a different size to my_param.data ?
Should my_param.grad be reassigned automatically when the size of .data changes?
Should the sizes of all .grad be matched to their .data when calling optimizer.zero_grad() ?
Options 2 and 4 make sense to me and seem not to complicated to implement. But in any case, it will add overhead.