Does pruning improve inference speed/memory usage?

ajktym94 · August 23, 2022, 12:37pm

I have been using the concept of Lottery Ticket Hypothesis to iteratively prune Resnet models with the final goal of improving inference speed and memory usage. LTH was implemented using the nn.utils.prune package. I have noticed that even with 80% of the weights removed, the memory usage did not improve. However, the inference speed improved by a small margin (1-1.5 seconds). I have tried this in several ways:

Global pruning and keeping weight_orig and weight_mask
Global pruning and removing weight_orig and weight_mask with prune.remove
Removing the above parameters and converting the tensors to sparse format (which resulted in error while doing inference)

There was no improvement in any of the above cases. However, compressing the models after removing the extra parameters helped in reducing the size of the model on disk by 75%.

Is this situation expected?