I’m seeing a lot of literature on the sparsification of neural networks, but I’m confused the benefit of said methods. Once we sparsify a network (say through various pruning methods), how do we actually save space in the final model? I see a sparse flag in the Linear layers, so do you set that before/after pruning?
Also related to the questions above, what does it mean to “prune” a network? Do you zero out the weights (but they still exist)? Not count them in the final parameter count?