I’m trying to remove some filters during training, however, the .backward() in the second iteration raises an error due to the size mismatch. The error is the following:
Traceback (most recent call last): File "fixingbackward.py", line 44, in <module> loss.backward() # Error here File "/home/user/data/envs/def/lib/python3.8/site-packages/torch/tensor.py", line 222, in backward torch.autograd.backward(self, gradient, retain_graph, create_graph) File "/home/user/data/envs/def/lib/python3.8/site-packages/torch/autograd/__init__.py", line 130, in backward Variable._execution_engine.run_backward( RuntimeError: Function CudnnConvolutionBackward returned an invalid gradient at index 1 - got [1, 7, 3, 3] but expected shape compatible with [1, 8, 3, 3]
The following code reproduces the error:
import torch class Model(torch.nn.Module): def __init__(self): super(Model, self).__init__() self.conv1 = torch.nn.Conv2d(1, 8, 3, padding=1) self.conv2 = torch.nn.Conv2d(8, 1, 3, padding=1) def forward(self, x): x = self.conv1(x) x = self.conv2(x) return x def loss_fun(x): return torch.sum(x) # Initialization model = Model().cuda() opt = torch.optim.Adam(model.parameters(), lr=1e-4) im = torch.rand((1, 1, 256, 256)).cuda() # Iteration 1 output = model(im) loss = loss_fun(output) opt.zero_grad() loss.backward() opt.step() # Remove the first channel of the filters (and the grads) model.conv1.weight.data = model.conv1.weight.data[1:] # size = (7, 1, 3, 3) model.conv1.bias.data = model.conv1.bias.data[1:] # size = (7) model.conv2.weight.data = model.conv2.weight.data[:, 1:] # size = (1, 7, 3, 3) #### Alternatively, this seems to work, but it increases memory usage until it explodes # model.conv1.weight = torch.nn.Parameter(model.conv1.weight.data[1:]) # (7, 1, 3, 3) # model.conv1.bias = torch.nn.Parameter(model.conv1.bias.data[1:]) # (7) # model.conv2.weight = torch.nn.Parameter(model.conv2.weight.data[:, 1:]) # (1, 7, 3, 3) # Iteration 2 output = model(im) loss = loss_fun(output) opt.zero_grad() loss.backward() # Error here opt.step()
Following this post, I also tried 1) to update weight.grad.data similarly, and 2) to even initialize the optimizer right before Iteration 2. But these approaches didn’t work.
After debugging for a long time I’m stuck here since I don’t know where does the information about the shape comes from. In fact, I don’t understand why this problem arises. I would expect that when I do the output = model(im) autograd realizes that the shapes have changed already and should ignore whatever it knows from Iteration 1. It seems that it stores some information from the previous iteration but I have no clue what/where can be.