Network pruning error

Hello,

I am very new to this topic but I am trying to prune the model I am working with. For reference, I am using this page. The model is quite big, containing different encoders, ResNet modules, and decoders. So, I’m guessing that I have to prune each network individually (I couldn’t find a reference where the whole model is being pruned together, but please attach some links where it’s being done). The list of different modules are like:

module.model_enc1.1.weight
module.model_enc1.1.bias
module.model_enc1.2.weight
module.model_enc1.2.bias
module.model_enc1.4.weight
module.model_enc1.4.bias
module.model_enc1.5.weight
.
.
.

So I’m only taking the module.model_enc1.1.weight using the following code:

test = netM.module.model_enc1

where netM contains the model weights (<class 'torch.nn.parallel.data_parallel.DataParallel'> ).

So test contains the following model:

Sequential(
  (0): ReflectionPad2d((3, 3, 3, 3))
  (1): Conv2d(3, 64, kernel_size=(7, 7), stride=(1, 1))
  (2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (3): ReLU(inplace=True)
  (4): Conv2d(64, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
  (5): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (6): ReLU(inplace=True)
)

And when I run the pruning method by pytorch
prune.random_unstructured(test, name='1.weight', amount=0.3), I get the following error:


AttributeError Traceback (most recent call last)
in
----> 1 prune.random_unstructured(test, name=‘1.weight’, amount=0.3)

/usr/local/lib/python3.6/dist-packages/torch/nn/utils/prune.py in random_unstructured(module, name, amount)
851
852 “”"
–> 853 RandomUnstructured.apply(module, name, amount)
854 return module
855

/usr/local/lib/python3.6/dist-packages/torch/nn/utils/prune.py in apply(cls, module, name, amount)
473 “”"
474 return super(RandomUnstructured, cls).apply(
–> 475 module, name, amount=amount
476 )
477

/usr/local/lib/python3.6/dist-packages/torch/nn/utils/prune.py in apply(cls, module, name, *args, **kwargs)
155 # starting from the state it is found in prior to this iteration of
156 # pruning
–> 157 orig = getattr(module, name)
158
159 # If this is the first time pruning is applied, take care of moving

/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py in getattr(self, name)
592 return modules[name]
593 raise AttributeError("’{}’ object has no attribute ‘{}’".format(
–> 594 type(self).name, name))
595
596 def setattr(self, name, value):

AttributeError: ‘Sequential’ object has no attribute ‘1.weight’

How do I fix this? Is there any better way to prune these networks?

print out test and look at the structure of the module. You’ll need to index test and use name=‘weight’. ‘1.weight’ is not an acceptable parameter name.

Try prune.random_unstructured(test[0], name=‘weight’, amount=0.3) or any other index in place of 0.

@Michela could you take a look?

1 Like

Hi. I got 1.weight by running this print(list(netM.module.model_enc1.named_parameters())). The output is:

[('1.weight', Parameter containing:
tensor([[[[ 2.5521e-02,  5.2238e-02,  4.7848e-04,  ...,  5.6985e-02,
            5.1901e-02,  5.1235e-02],
.
.
.
.

And then we have values for 1.bias, 2.weight, 2.bias as mentioned above.
This is the error I got for the code you mentioned:

AttributeError: ‘ReflectionPad2d’ object has no attribute ‘weight’

Your first Conv layer in the Sequential module is at index 1. Try prune.random_unstructured(test[1], name=‘weight’, amount=0.3). I think that should work. Let me know if it doesn’t.

2 Likes

Hi. This worked. Thank you so much. Can you also tell me if there’s some way I can prune such a big network in one go? Or do I need to iterate through every layer and prune it individually?

Try this https://pytorch.org/tutorials/intermediate/pruning_tutorial.html#global-pruning

1 Like

Thank you very much. I’ll check this out

@Flock1 the issue was related to how you were accessing the parameter. Either do dict(netM.module.model_enc1.named_parameters())['1.weight'] or do netM.module.model_enc1[1].weight.
@ani0075’s solution is correct (and related to the second option in the previous line of this answer) for when you want to refer to that module/parameter combination for the sake of pruning.

Beyond that, what do you mean by pruning “in one go”? Global pruning will allow you to prune the network by pooling all parameters together and comparing them with each other while deciding which ones to prune. That’s not the same thing as pruning each layer individually, but in an efficient way, without mixing weights across layers. Which one of the two were you interested in?

1 Like

Hi @Michela. Firstly, big fan of your work when it comes to network pruning and ML applications in physics. I work on ML applications for quantum computing and astrophysics. I am so glad you replied.

I think I am looking for Global pruning. I was kinda thinking that pruning each layer might not be an effective way compared to global pruning. I was trying one layer just to see how to go about it since it was my first time. But I want to prune the whole network. Moreover, do you recommend pruning the BatchNorm layer?

Global pruning is generally more flexible and has empirically been shown to have better performance – though be careful not to let it prune entire layers thus disconnecting your network!
Re: batch norm – that’s a more complicated issue, it depends what you want to achieve. Pruning batch norm params won’t really help you significantly reduce the number of params in the network. But if you prune an entire output, does it make any sense to keep its corresponding batch norm parameters? Btw, on the other hand, some even directly use batch norm to figure out which channels to prune in the respective layer. I’d recommend checking out the literature for this.

1 Like

Thank you. I have one question. Can you elaborate on “entire output”? Do you mean the final layer output or something else. Please let me know.

I mean an entire output dimension in any of the hidden layers. Batch norm layers compute y = γx + β with parameters γ,β for each normalized x at that layer.
If you pruned the previous layer such that a specific x is now always 0, does it make sense to keep its corresponding γ,β around?