Training time when using Dropout with p=0.0

vdw · November 26, 2022, 9:51am

I want to keep my model configurable including setting the Dropout probability to different values. The easiest way would be to create all my Dropout layers something like

self __init__(self, ..., drobout_prob=0.0, ...):
    ...
    self.dropout1 = nn.Dropout(p=dropout_prob)
    ...

self forward(X):
    ...
    out = self.dropout1(out)
    ...

I have no doubt this works quite fine. I only wonder if I sacrifice any noteworthy performance if my dropout probability is 0.0, making all Dropout layers essentially identity functions. In principle, I could do something like

self forward(X):
    ...
    if self.dropout_prob is not None and self.dropout_prob > 0.0:
        out = self.dropout1(out)
    ...

Would this have any measurable advantage in practice?

jondapper · November 26, 2022, 3:51pm

Internally, dropout with p=0 behaves as identity:

github.com

pytorch/pytorch/blob/1d705b4b075e32540293d1717b582442a66ffcce/aten/src/ATen/native/Dropout.cpp#L44


      
          
          
template<bool inplace>
          Tensor multiply(const Tensor& input, const Tensor& noise) {
            static_assert(!inplace, "Wrong multiply overload triggered in Dropout.cpp");
            return input.mul(noise);
          }
          
          
template<bool feature_dropout, bool alpha_dropout, bool inplace, typename T>
          Ctype<inplace> _dropout_impl(T& input, double p, bool train) {
            TORCH_CHECK(p >= 0 && p <= 1, "dropout probability has to be between 0 and 1, but got ", p);
            if (p == 0 || !train || input.numel() == 0) {
              return input;
            }
          
          
  if (p == 1) {
              return multiply<inplace>(input, at::zeros({}, input.options()));
            }
          
          
  at::Tensor b; // used for alpha_dropout only
            auto noise = feature_dropout ? make_feature_noise(input) : at::empty_like(input);
            noise.bernoulli_(1 - p);

so there should be need for the if statement.