As we known, it has to compensate by reducing the weights during test time by multiplying them by the dropout probability p=0.5. Does pytorch do itself?
Almost. According to the source it does multiply the signal during training by 1/(1-dropout probability). During eval/test time it simply doesn’t scale the outputs. The users don’t have to fiddle with this more than putting the model in eval() mode
1 Like