I have been using the concept of Lottery Ticket Hypothesis to iteratively prune Resnet models with the final goal of improving inference speed and memory usage. LTH was implemented using the nn.utils.prune
package. I have noticed that even with 80% of the weights removed, the memory usage did not improve. However, the inference speed improved by a small margin (1-1.5 seconds). I have tried this in several ways:
- Global pruning and keeping
weight_orig
andweight_mask
- Global pruning and removing
weight_orig
andweight_mask
withprune.remove
- Removing the above parameters and converting the tensors to sparse format (which resulted in error while doing inference)
There was no improvement in any of the above cases. However, compressing the models after removing the extra parameters helped in reducing the size of the model on disk by 75%.
Is this situation expected?