
what is the difference between randomunstructured pruning and dropout?

why would one not want to prune weights based on decreasing l1 norm, that is make weights that have high l1 norm zero, (current l1structured pruning does it based on making weights with lowest l1 norm 0)?

what should be distribution of weights after applying pruning?

Dropout stochastically drops connections at every pass, but then reinstates them at the following pass, while dropping new ones. It’s a regularization technique. After training, at evaluation time, all connections are present again. Random pruning (structured or unstructured) permanently removes units or connections from the network and permanently sets them to zero.

You can, if you want to. Usually the “low magnitude = less important” convention is used because it makes more logical sense and has been shown to have some solid empirical and theoretical foundations. But people have tried pruning high magnitude weights too, so you can definitely give it a shot if you think that it makes sense.

It depends on what pruning technique you use, what sparsity level you achieve, what initialization distribution you started out with, what task you’re solving, what optimization strategy you use, and more. So there is no real answer to this. Of course, if you prune out all connections with low synaptic weight in absolute value, then you will expect only weights with high absolute magnitude to remain.