anand.saha
(Anand Saha)
1
Hi,
In the dropout paper, the probability p stands for optimal probability of retention. i.e. p=1 means keep all activations.
In PyTorch, it’s the opposite. p stands for probability of an element to be zeroed. i.e. p=1 means switch off all activations.
Why the difference?
It seems to be the way to go for a lot of frameworks:
Keras dropout
Lasagne dropout
Mxnet dropout
Tensorflow.nn however seems to define it as the keep probability.
So it seems to vary a bit, but mostly it’s defined as the probabilty to zero out the input units.
1 Like