Pruning could also be used as a sort of neural architecture search method.
Say, I have a two-layer convolutional neural network.
conv1 = torch.nn.Conv2d(in_channels=3, out_channels=12)
conv2 = torch.nn.Conv2d(in_channels=12, out_channels=8)
If I used structured pruning and say the last three channels of the conv2 kernel was pruned to 0. Then the useful architecture of the pruned model should be
conv1 = torch.nn.Conv2d(in_channels=3, out_channels=9)
conv2 = torch.nn.Conv2d(in_channels=9, out_channels=8)
This architecture with the remaining parameters could be saved as a new model. The size of this new model should be smaller and the inference speed of the new model should be faster than the unpruned one.
I wonder if PyTorch is planning to add this interface or not.