Pruning inference time and memory usage

fnak · January 11, 2022, 10:45pm

I understand from this discussion that pruning API does not provide any memory benefits, nor improve inference time.

I am using the following code to prune my network:

for module in model.modules():
      if isinstance(module, nn.Conv2d):
          prune.l1_unstructured(module, 'weight', amount=0.5)
          prune.remove(module, 'weight')
      elif isinstance(module, torch.nn.Linear):
          prune.l1_unstructured(module, name='weight', amount=0.5)
          prune.remove(module, 'weight')

Is there anything that can be done to improve the memory footprint and inference time and not just prune for the sake of pruning?
PS: I can see that a tar version of my model did indeed has a smaller disk size. and the number of parameters are smaller. But this is useless if the computation is the same. no?