anand.saha
(Anand Saha)
#1
Hi,

In the dropout paper, the probability *p* stands for *optimal probability of retention.* i.e. p=1 means keep all activations.

In PyTorch, it’s the opposite. *p* stands for *probability of an element to be zeroed.* i.e. p=1 means switch off all activations.

Why the difference?

ptrblck
#2
It seems to be the way to go for a lot of frameworks:

Keras dropout

Lasagne dropout

Mxnet dropout

Tensorflow.nn however seems to define it as the keep probability.

So it seems to vary a bit, but mostly it’s defined as the probabilty to zero out the input units.

1 Like