Weight Pruning on BERT

The point of PyTorch pruning, at the moment, is not necessarily to guarantee inference time speedups or memory savings. It’s more of an experimental feature to enable pruning research.

In any case, answers to questions similar to yours were given here and here.

TL;DR: You can save space by calling .to_sparse() which brings your sparse tensor into coordinate representation. You cannot expect any inference speedups unless you use a custom sparse matrix algebra library to power your computation. torch.sparse is still a work in progress for now. Otherwise, for now, you’ll just be doing the same number of operations as you did before pruning, only now with a bunch of entries equal to zero in your tensors.

1 Like