Global structured pruning

It’s nice to see the new torch.nn.utils.prune.* module in 1.4.0 which is going to be very helpful!
But only "global unstructured" method is implemented in the module. I think, for real applications better to have “global structured” pruning because it’ll help reduce computation complexity along with parameters number avoiding manual tuning of pruning ratio for each layer.
So should we expect torch.nn.utils.prune.global_structured later?
Is it possible to implement custom global_structured or it is impossible for some reason?

Thanks!

3 Likes

“Since global structured pruning doesn’t make much sense unless the norm is normalized by the size of the parameter, we now limit the scope of global pruning to unstructured methods.”
This is written in Pruning tutorial. It says that the norm used to prune globally does not take into account the size of the parameter. Thus, it would just remove small parameters.
Check this https://github.com/JJGO/shrinkbench. Maybe they have global structured pruning.

You can absolutely implement it (and even contribute it back to pytorch, if you wish). We didn’t implement it in the first release for the exact reason that @Jayant_Parashar mentioned, as explained in the tutorial: since most models have different-size layers (with a different number of units or channels), global structured pruning across layers does not have a straightforward definition, logically speaking. Does it make sense to compare the norm of a channel with N kernels of size LxL in a given layer with a channel with n<<N kernels of size lxl with l<<L in another layer?

I mean, the comparison is valid, and you can definitely perform it an implement it as a pruning technique, but is it really a good proxy for importance of those channels? or is it just a measure of the size of their associated weight tensors? It might make more sense to try to normalize these norms by the number of total parameters that go into the computation of each norm, or something like that. Perhaps if you find a good paper that implements global structured pruning, we can see how they do it there, and implement their version of it.

but is it really a good proxy for importance of those channels?

You are correct, however I think that the same argument could be put forward in the unstructured pruning case, couldn’t it?

In unstructured global pruning we compare weights in entirely different layers and positions within the network. A small weight in one of the first layers might be much more important than a larger one in one of the last layers. Or am I missing something here?