Interpreting results of Dropout layer where p=1

I’m just doing a hyperparameter search on a network and realized at some point I set p=1 for my dropout layer, and the network was still able to learn MNIST to about 85%. It of course does not do as well as networks with some optimal p, in this case it was 0.3, but I don’t understand how the network can still learn.

Wouldn’t p=1 mean all activations are shut off every trial, and also it seems at some point dropout applies 1/(1-p) to activations, which would lead to a division by zero error (which I never get any error for). Is this even possible or should it be returning an error as well as not learning, hence I’m implementing dropout wrong?

Thanks.

So it turns out that when using torch.functional.dropout(layer, p=1), the default value for training=False, which did not use dropout each time hence it makes sense the network would learn.

I’ve now set it to take torch.functional.dropout(layer, p=1, training=self.training) and now the output of this call is a fully zeroed out tensor, and the network no longer performs.

Sorry for the misunderstanding! I assumed it worked like the torch.nn.dropout which uses self.training by default.

1 Like