Hello,
I am working with the newly released pruning functionalities in torch.nn.utils.prune
and I am working on extending this implementation of the MS-D network:
This is a network with densely connected 3x3 convolutions followed by a final layer of 1x1 convolutions. Some simplified code:
import msd_pytorch
import torch.nn.utils.prune as prune
def pytorch_L1_prune(model, depth, perc):
for k in range(depth):
wname ="weight"+str(k)
prune.l1_unstructured(model.msd.msd_block, name=wname, amount=perc)
prune.l1_unstructured(model.msd.final_layer.linear, name='weight', amount=perc)
return model
d = 100
model = msd_pytorch.MSDSegmentationModel(c_in, num_labels, d, width, dilations=dilations)
# Load model here
model = pytorch_L1_prune(model, d, 0.3)
What I have noticed:
The network works fine but I have noticed that the pruning has no effect on the 3x3 convolutions but it does on the 1x1 final layer. My hypothesis is that the forward_pre_hooks
are not applied for the forward()
pass of the 3x3 convolutions.
Hypothesis:
I found the following note in the docs (torch.nn.modules.module
):
Although the recipe for forward pass needs to be defined within this function, one should call the :class:`Module` instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
I don’t really understand what this note means or how I should resolve it which is why I thought it might be good to clear it up here.
The network has an MSDModule2D which has 2 types of modules; MSDBlock2D
for the 3x3 convolutions and MSDFinalLayer
at the end and the forward pass is as follows:
class MSDModule2d(torch.nn.Module):
def __init__(self, c_in, c_out, depth, width, dilations=[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]):
super(MSDModule2d, self).__init__()
... # initialize dilations etc.
self.msd_block = MSDBlock2d(self.c_in, self.dilations, self.width)
self.final_layer = MSDFinalLayer(c_in=c_in + width * depth, c_out=c_out)
...
def forward(self, input):
output = self.msd_block(input)
output = self.final_layer(output)
return output
The latter basically uses nn.Conv1D()
so the above note probably does not apply and it would therefore make sense that forward_pre_hooks
are applied. The MSDBlock2D
module implements a somewhat complicated forward pass that I do not want to fully post here:
class MSDBlockImpl2d(torch.autograd.Function):
@staticmethod
def forward(ctx, input, dilations, bias, *weights):
... # complicated
@staticmethod
def backward(ctx, grad_output):
... # complicated
class MSDBlock2d(torch.nn.Module):
def __init__(self, in_channels, dilations, width=1):
super().__init__()
... # initialize weights etc.
def forward(self, input):
bias, *weights = self.parameters()
return MSDBlockImpl2d.apply(input, self.dilations, bias, *weights)
Any ideas what is going on? Is the above hypothesis correct and if so, what does it mean and what should I add to the MSDBlock2D
module?
Kind regards,
Richard
EDIT: CHECK ON PRUNING METHODS
I checked whether the pruning method does its job.
print(list(model.msd.final_layer.linear.named_buffers()))
print(list(model.msd.msd_block.named_buffers()))
The model does have the masks etc. Furthermore, the following
print(model.msd.final_layer.linear._forward_pre_hooks)
print(model.msd.msd_block._forward_pre_hooks)
shows that the hooks are initialized. They are just not applied in the msd_block
case it seems.