Compressing a Pruned `nn.Module`

Hey everyone,

I understand that the pruning methods in Pytorch create masks of the network’s weights, and when we use prune.remove, we remove that reparametrization, leaving the same number of network weights, where the pruned weights are set to zero.

I was wondering how I might then be able to reparametrize the network to remove those zeroed weights, resulting in a smaller model for devices with limited memory. Can anyone provide some thoughts on what that would require?