Sparsifying neural networks

I’m seeing a lot of literature on the sparsification of neural networks, but I’m confused the benefit of said methods. Once we sparsify a network (say through various pruning methods), how do we actually save space in the final model? I see a sparse flag in the Linear layers, so do you set that before/after pruning?

Also related to the questions above, what does it mean to “prune” a network? Do you zero out the weights (but they still exist)? Not count them in the final parameter count?

The idea in behind is that neural networks are (in general) overparametrized. In theory, the more parameters it a DNN has, the easier it converges.

Idea in behind is that once a DNN has been trained you can get rid off these non useful parameters. I don’t know how it is implemented in practice. I guess it will be some kind of sparse tensor.