How weight pruning is different from dropout?
Dropout drops certain activations stochastically (i.e. a new random subset of them for any data passing through the model). Typically this is undone after training (although there is a whole theory about test-time-dropout).
Pruning drops certain weights, i.e. permanently drops some parts deemed “uninteresting”.
Thank you for quick response. I applied global pruning over my model and then retrain it, during test the weight parameter is not there. i think those are renamed. then how can i test the pruned model??
What is your intuition about why (and when) dropout works?
I’m a Bayesian at heart, so to me Gal and Ghahramani interpretation seems to be a thing.
In a later paper they show that for RNNs it makes sense to keep the dropout mask fixed between the timesteps to use this interpretation and Gal’s theses has a lot of things on the theme (including “why can we not optimize the dropout probability somewhere, I think”). More generally, it would seem that the hypothetical variational model must make sense as well as the approximation.
What is your take? I always love to hear your thoughts on the more theory-grounded topics.
Thanks for the Gal and Ghahramani link. I can’t say that I’ve absorbed
much of it yet.
I’m not well-studied on dropout, and I just don’t have any intuition about
it that feels “right” to me, so I don’t really have a take.
Can anyone please help in this??
I applied global pruning over my model and then retrain it, during test the weight parameter is not there. i think those are renamed. then how can i test the pruned model??
Sorry for diverting your thread.
Yes, the pruning seems to rename weights (appending
_orig probably is the most common), you can get the current set of weight names using
print([n for n, p in module.named_parameters()])
There is an equivalent
named_buffers for non-parameters that gets masks.
Also, the tutorial section on removing the reparametrization in the tutorial might be of use.
Tom thank you for providing relevant content. I will try this and will share the result
I tried to apply global pruning over my model which is having one lstm layer. I am getting
UserWarning: RNN module weights are not part of single contiguous chunk of memory. This means they need to be compacted at every call, possibly greatly increasing memory usage. To compact weights again call flatten_parameters().
then i put flatten_parameters in forward of the module. But getting the same. Can u plz help
I think this is inherent in the way pruning is implemented using reparametrization.
One way to overcome the error message could be to have a workflow where you do pruning and then remove the reparametrization.
The other option is to make it work with non-flattened inputs, i.e. either
- to silence or avoid the warning using the Python
- to disable CuDNN, if the flattening and Python warning overhead is larger than the savings by CuDNN optimizations,
- rolling your own JIT fused RNN might be another option, but sadly it has never become as much the standard as one might have hoped.
I can see how neither option is terribly attractive but it seems that things are not working together as well as they should.
ok. Theoretically removing the reparameterization is appealing. so I am trying with the same and Will inform about results when done.
removing reparamterization worked well for me. thanks Thomas.
Next issue i am facing is: if i 90% globally prune the model in one-shot it is pruned approximately 90% with different pruning rate in different layers. But iteratively pruning the model with factor 0.1 for 10 iterations, i am getting the model pruned approximately upto 65.6 %. What is the reason behind it?? I thought it to be pruned approximately 99%.
Are you randomly pruning? If so I would suspect that the pruning is considering all weights (including previously pruned) in the computation: If I randomly set 0.1 of the weights to 0 for 10 iterations, I expect (in lieu of LaTeX support in the forum)
sum([0.1 * (0.9**i) for i in range(10)]) ~ 65.1% of weights to be set to 0.
This appears to be consistent with the documentation in Pruning Tutorial — PyTorch Tutorials 1.9.1+cu102 documentation . If you have some other criterion, maybe it is applied to the original weights. I have to admit that I don’t know how it exactly works from the top of my head.
I am pruning the model weights using
for 10 iterations. Your explanation makes sense that it must be considering the pruned weights also. Then how can i make the model not to consider those again in other iterations. One more question how can i calculate the inference time of the model
One thing that might be worthwhile to try is progressively increasing the amount if that matches what you want to achieve.
I don’t have a good answer, but two thoughts:
- I don’t think unstructured sparsity is something lending itself easily to performance gains. (Though at 90%ish it might just work, but I would not know how to do it.)
- I would probably measure time.
Thank you Thomas, I started experimenting these thoughts that why took long to reply. But today i got the results, progressive increasing the amount till i reach threshold worked well and i am able to speed up the model by measuring the time before inference call and after that. My aim was not to have performance gains but just to reduce model size that i am able to achieve thanks