Given a trained object detection network the goal of my program is to prune the network, deleting the least contributing filters and their dependent feature maps and biases. After a network is pruned the network is retrained to regain the accuracy it has lost. Retraining this network a second time after pruning and retraining it a first time results in the following error:
Traceback (most recent call last):
File "Pruning.py", line 212, in <module>
prune()
File "Pruning.py", line 52, in __call__
self.pruneLoop(prune)
File "Pruning.py", line 80, in pruneLoop
traineng()
File "/home/bullseye/.local/lib/python3.6/site-packages/lightnet/engine/_engine.py", line 124, in __call__
self.process_batch(data)
File "/home/bullseye/Documents/git/Master-Thesis-Pruning-for-Object-Detection/docker-omgeving/Code/trainengine.py", line 34, in process_batch
loss.backward()
File "/home/bullseye/.local/lib/python3.6/site-packages/torch/tensor.py", line 166, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
File "/home/bullseye/.local/lib/python3.6/site-packages/torch/autograd/__init__.py", line 99, in backward
allow_unreachable=True) # allow_unreachable flag
RuntimeError: The size of tensor a (1010) must match the size of tensor b (798) at non-singleton dimension 1
Since the error is occurring in the backwards function it hints that the problem is created when retraining a first time. My first thought was that the grad created by training a first time was the issue but the problem remained even after deleting the gradients after pruning.
I’m not completely sure, how your pruning works, but could it be that you are removing some parameters after the computation graph was already created (e.g. after a new forward pass)?
@ptrblck Yes I delete some filters, their biases and the dependent feature maps (for example in model.weight.data) during pruning. Although the pruning is surrounded by a
with torch.no_grad
I suspect that the training would generate this graph in it’s forward pass. Should I delete this graph? And if so, how do I access it?
I am unsure of how to provide a minimal code snippet of my program that is able to reproduce the error since it is several hundreds of lines. However I have found the solution to my problem.
As I first suspected it was the grad attributes of the tensors that were the problem. The size of these tensors were different to the size of the weights itself after pruning them. Setting them to None solved the issue.
The first time I attempted this solution I forgot that the running mean and running var tensors of batch normalization layers also contain gradients. @ptrblck Thank you for your help, your dummy code helped me understand and find the problem better. I am however still unsure of why your dummy code doesn’t face the same issues.