RuntimeError: The size of tensor a must match size of tensor b after pruning

Given a trained object detection network the goal of my program is to prune the network, deleting the least contributing filters and their dependent feature maps and biases. After a network is pruned the network is retrained to regain the accuracy it has lost. Retraining this network a second time after pruning and retraining it a first time results in the following error:

Traceback (most recent call last):
  File "", line 212, in <module>
  File "", line 52, in __call__
  File "", line 80, in pruneLoop
  File "/home/bullseye/.local/lib/python3.6/site-packages/lightnet/engine/", line 124, in __call__
  File "/home/bullseye/Documents/git/Master-Thesis-Pruning-for-Object-Detection/docker-omgeving/Code/", line 34, in process_batch
  File "/home/bullseye/.local/lib/python3.6/site-packages/torch/", line 166, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph)
  File "/home/bullseye/.local/lib/python3.6/site-packages/torch/autograd/", line 99, in backward
    allow_unreachable=True)  # allow_unreachable flag
RuntimeError: The size of tensor a (1010) must match the size of tensor b (798) at non-singleton dimension 1

Since the error is occurring in the backwards function it hints that the problem is created when retraining a first time. My first thought was that the grad created by training a first time was the issue but the problem remained even after deleting the gradients after pruning.

What is causing this error?

I’m not completely sure, how your pruning works, but could it be that you are removing some parameters after the computation graph was already created (e.g. after a new forward pass)?

@ptrblck Yes I delete some filters, their biases and the dependent feature maps (for example in during pruning. Although the pruning is surrounded by a

with torch.no_grad

I suspect that the training would generate this graph in it’s forward pass. Should I delete this graph? And if so, how do I access it?

Could you post a minimal code snippet to reproduce this issue?
This dummy approach seems to work fine:

class MyModel(nn.Module):
    def __init__(self):
        super(MyModel, self).__init__()
        self.conv = nn.Conv2d(3, 10, 3, 1, 1, bias=False)
    def forward(self, x):
        x = self.conv(x)
        return x

model = MyModel()
x = torch.randn(1, 3, 3, 3)
output = model(x)

# prune
with torch.no_grad():
    model.conv.weight = nn.Parameter(model.conv.weight[:5])

output = model(x)

I am unsure of how to provide a minimal code snippet of my program that is able to reproduce the error since it is several hundreds of lines. However I have found the solution to my problem.
As I first suspected it was the grad attributes of the tensors that were the problem. The size of these tensors were different to the size of the weights itself after pruning them. Setting them to None solved the issue.
The first time I attempted this solution I forgot that the running mean and running var tensors of batch normalization layers also contain gradients.
@ptrblck Thank you for your help, your dummy code helped me understand and find the problem better. I am however still unsure of why your dummy code doesn’t face the same issues.

1 Like