Pruning and model size

Hi all,

This is my first question, so I will try to be not too clumsy. Besides, I am more or less a beginner with PyTorch, so I hope my questions are not that evident. And thanks for all your help beforehand!

Well, I just trained a simple model with 3 linear layers and everything went smooth and fine. Now, I want to replicate this model in order to allow some adversarial training of a few input samples at the same time, and for that I will be using the hack Goodfellow proposed here.

The problem is that the model may end up being quite big. But since many weights, all the off diagonal blocks, are going to be equal to zero, I thought: ok, let’s prune them, I don’t need them for anything. Then, the size of these repeated model will just scale with the number of replicas of the model, and not quadratically. However, I just put together a small example and this does not seem to work correctly:

# Original small model
enc_small = torch.nn.Sequential()
enc_small.add_module("dense_0", nn.Linear(2000, 4000)), 'orig_model.pth')

n_rep = 3

# Repeated model
enc_rep = torch.nn.Sequential()
enc_rep.add_module("dense_0", nn.Linear(2000 * n_rep, 4000 * n_rep)), 'model_repeated.pth')

# Repeated model with pruning
enc_pruned = torch.nn.Sequential()
enc_pruned.add_module("dense_0", nn.Linear(2000 * n_rep, 4000 * n_rep))
# This mask is just zeroing out all elements out of the center blocks of 2000x4000
mask = np.zeros(encoder1.dense_0.weight.shape).astype(int)
for ir in range(n_rep + 1):
    mask[((ir - 1) * 4000):(4000*ir),
         ((ir - 1) * 2000):(2000*ir)] = 1

torch.nn.utils.prune.custom_from_mask(enc_pruned.dense_0, name='weight', mask = torch.tensor(mask)), 'pruned_model.pth')

And here is how the models look one saved:

25629256 -rw-r--r-- 1 users 1.4G Aug 16 09:47 pruned_model.pth
25629254 -rw-r--r-- 1 users 275M Aug 16 09:46 model_repeated.pth
25629253 -rw-r--r-- 1 users  31M Aug 16 09:46 orig_model.pth

I do believe I am miserably missing something, but I don’t know what… It might be the behaviour of torch.nn.utils.prune.custom_from_mask, that I am not understanding… Hence, I will really appreciate if you could offer me some help. Thank you so much!!!

P.S: I am in parallel experimenting with assigning the parameters as sparse matrices, but so far, to no avail…

I’m unfortunately not familiar enough with the pruning utilities, but @Michela might know, if the file size is expected.

PS: I would also generally recommend to store and load the state_dict instead of the complete model as described here.

Thank you so much @ptrblck! Well, I believe @Michela will know :slight_smile: Regarding what you mention, I will try then using state_dict. But it was already surprising to me that the size was changing that much, as on top of the weight_mask, I cannot think about any other overhead created on the pruned model.

Ok, here I come with updated results. They are already better with state_dict(), but the pruned model still takes more memory. I just added:, 'pruned_model.pth')
to all the models being saved.

And this is the output:

25629256 -rw-r--r-- 1 users 550M Aug 18 08:43 pruned_model.pth
25629254 -rw-r--r-- 1 users 275M Aug 18 08:43 model_repeated.pth
25629253 -rw-r--r-- 1 users  31M Aug 18 08:43 orig_model.pth

So, since the size is precisely twice the size of the original repeated, might it come from the mask? Is it then pruning correctly working?

Thanks again for all your help!!

Hi, I’m new with pruning and not really very familiar with this practice, but let me share my thoughts.

According to the pruning tutorial in pytorch, applying a pruning routine like torch.nn.utils.prune.custom_from_mask only creates the masks, increasing the number of tensors stored in the model, and thus increasing its size. To really apply such masks, next one has to remove the pruned connections by running prune.remove(module, "weight") or similar.

In that case, saving with leads to stored models of the same size. However, following this notebook (which is explained in this tutorial), if you compress the model, the pruned model becomes lighter than the original one.

Hope it helps!