Implementing dropout

Hi, I wonder if I want to implement dropout by myself, is something like the following sufficient (taken from machine learning - Implementing dropout from scratch - Stack Overflow):

class MyDropout(nn.Module):
    def __init__(self, p: float = 0.5):
        super(MyDropout, self).__init__()
        if p < 0 or p > 1:
            raise ValueError("dropout probability has to be between 0 and 1, " "but got {}".format(p))
        self.p = p

    def forward(self, X):
        if self.training:
            binomial = torch.distributions.binomial.Binomial(probs=1-self.p)
            return X * binomial.sample(X.size()) * (1.0/(1-self.p))
        return X

My concern is even if the unwanted weights are masked out (either through this way or by using a mask tensor), there can still be gradient flow through the 0 weights (Custom connections in neural network layers - #9 by Kaixhin). Is my concern valid?

This looks like a fine implementation.

In the example you link,
y = w * x, so dy/dw = x.

However, in your context, you’d have something like
y = dropout(w * x + b) + ...
if dropout returns 0, the contribution of w is 0

Thanks for your reply. So can I just do something like this:
relu(self.fc(x)*self.mask)

not sure if masking happens before or after activation.