I am working with the newly released pruning functionalities in
torch.nn.utils.prune and I am working on extending this implementation of the MS-D network:
This is a network with densely connected 3x3 convolutions followed by a final layer of 1x1 convolutions. Some simplified code:
import msd_pytorch import torch.nn.utils.prune as prune def pytorch_L1_prune(model, depth, perc): for k in range(depth): wname ="weight"+str(k) prune.l1_unstructured(model.msd.msd_block, name=wname, amount=perc) prune.l1_unstructured(model.msd.final_layer.linear, name='weight', amount=perc) return model d = 100 model = msd_pytorch.MSDSegmentationModel(c_in, num_labels, d, width, dilations=dilations) # Load model here model = pytorch_L1_prune(model, d, 0.3)
What I have noticed:
The network works fine but I have noticed that the pruning has no effect on the 3x3 convolutions but it does on the 1x1 final layer. My hypothesis is that the
forward_pre_hooks are not applied for the
forward() pass of the 3x3 convolutions.
I found the following note in the docs (
Although the recipe for forward pass needs to be defined within this function, one should call the :class:`Module` instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
I don’t really understand what this note means or how I should resolve it which is why I thought it might be good to clear it up here.
The network has an MSDModule2D which has 2 types of modules;
MSDBlock2D for the 3x3 convolutions and
MSDFinalLayer at the end and the forward pass is as follows:
class MSDModule2d(torch.nn.Module): def __init__(self, c_in, c_out, depth, width, dilations=[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]): super(MSDModule2d, self).__init__() ... # initialize dilations etc. self.msd_block = MSDBlock2d(self.c_in, self.dilations, self.width) self.final_layer = MSDFinalLayer(c_in=c_in + width * depth, c_out=c_out) ... def forward(self, input): output = self.msd_block(input) output = self.final_layer(output) return output
The latter basically uses
nn.Conv1D() so the above note probably does not apply and it would therefore make sense that
forward_pre_hooks are applied. The
MSDBlock2D module implements a somewhat complicated forward pass that I do not want to fully post here:
class MSDBlockImpl2d(torch.autograd.Function): @staticmethod def forward(ctx, input, dilations, bias, *weights): ... # complicated @staticmethod def backward(ctx, grad_output): ... # complicated class MSDBlock2d(torch.nn.Module): def __init__(self, in_channels, dilations, width=1): super().__init__() ... # initialize weights etc. def forward(self, input): bias, *weights = self.parameters() return MSDBlockImpl2d.apply(input, self.dilations, bias, *weights)
Any ideas what is going on? Is the above hypothesis correct and if so, what does it mean and what should I add to the
EDIT: CHECK ON PRUNING METHODS
I checked whether the pruning method does its job.
The model does have the masks etc. Furthermore, the following
shows that the hooks are initialized. They are just not applied in the
msd_block case it seems.