Interpreting results of Dropout layer where p=1


(Lina) #1

I’m just doing a hyperparameter search on a network and realized at some point I set p=1 for my dropout layer, and the network was still able to learn MNIST to about 85%. It of course does not do as well as networks with some optimal p, in this case it was 0.3, but I don’t understand how the network can still learn.

Wouldn’t p=1 mean all activations are shut off every trial, and also it seems at some point dropout applies 1/(1-p) to activations, which would lead to a division by zero error (which I never get any error for). Is this even possible or should it be returning an error as well as not learning, hence I’m implementing dropout wrong?

Thanks.


(Lina) #2

So it turns out that when using torch.functional.dropout(layer, p=1), the default value for training=False, which did not use dropout each time hence it makes sense the network would learn.

I’ve now set it to take torch.functional.dropout(layer, p=1, training=self.training) and now the output of this call is a fully zeroed out tensor, and the network no longer performs.

Sorry for the misunderstanding! I assumed it worked like the torch.nn.dropout which uses self.training by default.