# How to dropout with non zero value?

I want to use feature dropout like dropout2d and fill it with mean value (or Gaussian noise for example) instead of zeros.
How to do so?
Easiest thing to do is runing dropout2d and fill zeros, but i have zeros in data.

You could sample a binary mask using the drop probability `p` and fill the tensor with your desired value:

``````x = torch.ones(10, 20)

p = 0.5
``````

Note that dropout scales also the activation to make sure the expected activation ranges are equal during training and validation via:

``````# manual dropout
out = x * mask * 1/(1-p)
``````

So you would have to take care of this scaling for your approach.

1 Like

Can i sample values from tensor for mask values?
Also do i really need to scale, if dropping out with non zero values?

I’m not sure I understand the first question.
In my code I sample the mask manually. Would that work of what would you like to sample additionally?

If you enable dropout during training and disable it during evaluation, the expected activation values will have a different range and thus your model will most likely perform poorly.
The scaling is described in the original Dropout paper.

I want to calculate mean frame with taking into account zero padding and replace some random frames with it.
This code raises an error.
What is the right way?

``````def mean_frame_dropout(x, lens, p=0.2):
x = x.clone()
batch_size, length, features = x.size()
probs=(1 - p)).sample((batch_size, length)).to(x.device)
mean = x.detach().sum(1) / lens
return x``````

What kind of error are you getting? Could you post the stack trace as well as the shapes of the input tensors?

Mean and x sizes
torch.Size([1, 128])
torch.Size([1, 167, 128])

RuntimeError: shape mismatch: value tensor of shape  cannot be broadcast to indexing result of shape 

The `mean` tensor has the shape `[1, 128]` as it’s calculated by `x.sum(1)`.
However, `x[mask]` has a variable number of elements so how should the assignment work?
E.g. lets assume `mask` samples 5120 `True` values, thus `x[mask]` would have the shape ``.
How should the 128 values of `mean` be assigned to which value?

What is the right way to mask it?

It depends, what you want to achieve.
The original `mask` before the `unsqueeze` and `expand` operations can be used to index `x` directly, which would yield:

``````x = torch.ones([1, 167, 128])
batch_size, length, features = x.size()
p = 0.5
probs=(1 - p)).sample((batch_size, length)).to(x.device)
> torch.Size([1, 167])
> torch.Size([81, 128])
> tensor(81)
``````

So would you like to replace all these `81` values in `dim1` with the sum of them?

Should you advice, how to replace some block of channels with mean values?
And perform it independly for each sample in batch.
Im not sure how to select indexes over batch and len axis at the same time.

For x = torch.rand(32, 1024, 128)

``````def mean_freq_mask(x, p=0.2):
if torch.rand(1) < p:
x = x.clone()
batch_size, length, features = x.size()
F = features // 3
mean = x.detach().mean(2)
return x
``````

This gives error only integer tensors of a single element can be converted to an index

What is the correct way to use this block inside a model class?
Let’s say I have the following CNN:

``````cfg = {

'VGG16': [64, 'Dp', 64, 'M', 128, 'Dp', 128, 'M', 256, 'Dp', 256, 'Dp', 256, 'M', 512,'Dp', 512,'Dp', 512, 'M', 512,'Dp', 512,'Dp', 512, 'A', 'Dp'], #dropouts dependent from a single parameter (useful for hyper-par optim.)

}

class VGG(nn.Module, NetVariables, OrthoInit):
def __init__(self, params):

self.params = params.copy()

nn.Module.__init__(self)
NetVariables.__init__(self, self.params)
OrthoInit.__init__(self)

self.features = self._make_layers(cfg['VGG16'])
self.classifier = nn.Linear(512, self.num_classes)

self.weights_init() #call the orthogonal initial condition

def forward(self, x):
outs = {}
L2 = self.features(x)
outs['l2'] = L2
Out = L2.view(L2.size(0), -1)
Out = self.classifier(Out)
outs['out'] = Out
return outs

def _make_layers(self, cfg):
layers = []
in_channels = 3
for x in cfg:
if x == 'M':
layers += [nn.MaxPool2d(kernel_size=2, stride=2)]

elif x=='A':
layers += [nn.AvgPool2d(kernel_size=2, stride=2)]

elif x == 'D3':
layers += [nn.Dropout(0.3)]

elif x == 'D4':
layers += [nn.Dropout(0.4)]

elif x == 'D5':
layers += [nn.Dropout(0.5)]

elif x == 'Dp':
layers += [nn.Dropout(self.params['dropout_p'])]

else:
layers += [nn.Conv2d(in_channels, x, kernel_size=3, padding=1),

nn.Tanh()
,nn.GroupNorm(int(x/self.params['group_factor']), x)
]
in_channels = x
layers += [nn.AvgPool2d(kernel_size=1, stride=1)]
return nn.Sequential(*layers)

``````

I’d like to substitute the dropout layer with the block you proposed, giving each time I forward the mask as input; how should I do it?

If you want to pass the `mask` as an additional argument to the `forward` method of your custom `Dropout` layer, you should write this logic into the `forward` method of `VGG` explicitly and remove the usage of the `nn.Sequential` block as it expects a single input/output in the default implementation.

1 Like

Is there some other way around to keep the code more flexible and clean; in the above solution with `nn.Sequential` I can easly modify the architecture ‘chain’ of modules directly from `cfg` dict.
Can I do something similar without `nn.Sequential`?

You could try to use an `nn.ModuleList` or `nn.ModuleDict`, add the layers using your `cfg`, and iterate it in the `forward` method.

1 Like