Does pytorch do 2x the expected output in dropout during training?

As we known, it has to compensate by reducing the weights during test time by multiplying them by the dropout probability p=0.5. Does pytorch do itself?

Almost. According to the source it does multiply the signal during training by 1/(1-dropout probability). During eval/test time it simply doesn’t scale the outputs. The users don’t have to fiddle with this more than putting the model in eval() mode

1 Like