Questions about dropout (2)

Hello,
I am studying dropout algorithm
I am implementing dropout using Python or Julia.

If I have two hidden layers and dropout ratios are 0.5, 0.3 respectively,
Then how much ratio should I multiply to the output when evaluating?

Also, should (may) I multiply some ratio (such as 1/(1-p) for some appropriate p) to the output when training?

  1. When using Pytorch, when I use dropout, I have never multiplied some ratio to the output when evaluating? Does PyTorch automatically multiply some ratio to the output when evaluating?

Thank you in advance

PyTorch scales the activation during training with the inverse as seen here.
This inverse scaling is mentioned in the original dropout paper (if I’m not mistaken) and allows you to just disable the dropout layer during evaluation.

1 Like

Thank you for your answer.
It helped me a lot
Have a nice day!

1 Like