But one thing to consider is whether alpha is that descriptive a name for the standard deviation and whether it is a good parameter convention.
PyTorch’s standard dropout with Bernoulli takes the rate p. The multiplicator will have mean 1 and standard deviation (p * (1-p))**0.5 / (1-p) = (p/(1-p))**0.5 (on the left side, the numerator (p*(1-p))**0.5 is the standard deviation of the Bernoulli and the denominator 1-p comes from the scaling.
So if you want to match more what (Bernoulli) Dropout does in terms of mean and std, you could take an argument p and use the standard deviation (p/(1-p))**0.5 instead of self.alpha.
(I think e.g. Keras’ Gaussian dropout does that, too.)
Almost!
I’d do the assert (or better RuntimeError, assert should be for “internal” dev errors) in the init and probably would just use torch.randn_like(…) * stddev. Other than that it looks good to me at first sight.
Thanks! Will leave the final implementation here in case it’s useful for someone else.
class GaussianDropout(nn.Module):
def __init__(self, p=0.5):
super(GaussianDropout, self).__init__()
if p <= 0 or p >= 1:
raise Exception("p value should accomplish 0 < p < 1")
self.p = p
def forward(self, x):
if self.training:
stddev = (self.p / (1.0 - self.p))**0.5
epsilon = torch.randn_like(x) * stddev
return x * epsilon
else:
return x
P.P.S.: For v2 of the code, I’d probably allow p=0. It can be handy to use a hyperparameter to disable dropout for experimentation (even if not having the module would be more efficient, but changing the structure this can lead to hickups e.g. with nn.Sequential when you want to compare parameters etc.).