Pruning vs Dropout

How weight pruning is different from dropout?

Dropout drops certain activations stochastically (i.e. a new random subset of them for any data passing through the model). Typically this is undone after training (although there is a whole theory about test-time-dropout).

Pruning drops certain weights, i.e. permanently drops some parts deemed “uninteresting”.


Thank you for quick response. I applied global pruning over my model and then retrain it, during test the weight parameter is not there. i think those are renamed. then how can i test the pruned model??

Hi Tom!

What is your intuition about why (and when) dropout works?


K. Frank

I’m a Bayesian at heart, so to me Gal and Ghahramani interpretation seems to be a thing.
In a later paper they show that for RNNs it makes sense to keep the dropout mask fixed between the timesteps to use this interpretation and Gal’s theses has a lot of things on the theme (including “why can we not optimize the dropout probability somewhere, I think”). More generally, it would seem that the hypothetical variational model must make sense as well as the approximation.

What is your take? I always love to hear your thoughts on the more theory-grounded topics.

Best regards


1 Like

Hi Thomas!

Thanks for the Gal and Ghahramani link. I can’t say that I’ve absorbed
much of it yet.

I’m not well-studied on dropout, and I just don’t have any intuition about
it that feels “right” to me, so I don’t really have a take.


K. Frank

Can anyone please help in this??
I applied global pruning over my model and then retrain it, during test the weight parameter is not there. i think those are renamed. then how can i test the pruned model??

Sorry for diverting your thread.
Yes, the pruning seems to rename weights (appending _orig probably is the most common), you can get the current set of weight names using

print([n for n, p in module.named_parameters()])

There is an equivalent named_buffers for non-parameters that gets masks.

Also, the tutorial section on removing the reparametrization in the tutorial might be of use.

Best regards



Tom thank you for providing relevant content. I will try this and will share the result :slight_smile:

I tried to apply global pruning over my model which is having one lstm layer. I am getting
UserWarning: RNN module weights are not part of single contiguous chunk of memory. This means they need to be compacted at every call, possibly greatly increasing memory usage. To compact weights again call flatten_parameters().

then i put flatten_parameters in forward of the module. But getting the same. Can u plz help

I think this is inherent in the way pruning is implemented using reparametrization. :frowning:

One way to overcome the error message could be to have a workflow where you do pruning and then remove the reparametrization.

The other option is to make it work with non-flattened inputs, i.e. either

  • to silence or avoid the warning using the Python warnings mechanism or
  • to disable CuDNN, if the flattening and Python warning overhead is larger than the savings by CuDNN optimizations,
  • rolling your own JIT fused RNN might be another option, but sadly it has never become as much the standard as one might have hoped.

I can see how neither option is terribly attractive but it seems that things are not working together as well as they should.

Best regards


ok. Theoretically removing the reparameterization is appealing. so I am trying with the same and Will inform about results when done.

removing reparamterization worked well for me. thanks Thomas.

Next issue i am facing is: if i 90% globally prune the model in one-shot it is pruned approximately 90% with different pruning rate in different layers. But iteratively pruning the model with factor 0.1 for 10 iterations, i am getting the model pruned approximately upto 65.6 %. What is the reason behind it?? I thought it to be pruned approximately 99%.