As Hinton mentioned
At test time, we use the “mean network” that contains all of the hidden units but with their outgoing weights halved to compensate for the fact that twice as many of them are active.
in their paper Improving neural networks by preventing co-adaptation of feature detectors,I wonder does it matters to perform the half operation and how does
torch.nn.functional.dropout deal with this question?
I’m going to assume that when you say ‘perform the half operation’, you mean scaling the activations by 0.5 at test time. I main idea is that the ouput of the layer at test time should be the expected value, given some dropout probability
In the case of
nn.fucntional.droupout, you have to use it like this:
nn.functional.droupout(input, p=0.5, training=False)
If you were using the module version from
nn.Dropout, you can call the
eval function on the module, or the parent calss which contains the dropout module.
I have got the idea,.‘perform the half operation’ comes from “inverted dropout”,I got the answer from here,I think PyTorch’s implementation of dropout uses this method.