Confusion about Dropout

Lelouch · March 10, 2019, 8:40am

As Hinton mentioned

At test time, we use the “mean network” that contains all of the hidden units but with their outgoing weights halved to compensate for the fact that twice as many of them are active.

in their paper Improving neural networks by preventing co-adaptation of feature detectors,I wonder does it matters to perform the half operation and how does torch.nn.functional.dropout deal with this question?

vabh · March 10, 2019, 1:57pm

Hi,
I’m going to assume that when you say ‘perform the half operation’, you mean scaling the activations by 0.5 at test time. I main idea is that the ouput of the layer at test time should be the expected value, given some dropout probability p.

In the case of nn.fucntional.droupout, you have to use it like this:
nn.functional.droupout(input, p=0.5, training=False)

If you were using the module version from nn.Dropout, you can call the eval function on the module, or the parent calss which contains the dropout module.

Lelouch · March 11, 2019, 1:06am

I have got the idea,.‘perform the half operation’ comes from “inverted dropout”,I got the answer from here,I think PyTorch’s implementation of dropout uses this method.