How to dropout with non zero value?

I want to use feature dropout like dropout2d and fill it with mean value (or Gaussian noise for example) instead of zeros.
How to do so?
Easiest thing to do is runing dropout2d and fill zeros, but i have zeros in data.

You could sample a binary mask using the drop probability p and fill the tensor with your desired value:

x = torch.ones(10, 20)

p = 0.5
mask = torch.distributions.Bernoulli(probs=(1-p)).sample(x.size())
x[~mask.bool()] = x.mean()

Note that dropout scales also the activation to make sure the expected activation ranges are equal during training and validation via:

# manual dropout
out = x * mask * 1/(1-p)

So you would have to take care of this scaling for your approach.

1 Like

Can i sample values from tensor for mask values?
Also do i really need to scale, if dropping out with non zero values?

I’m not sure I understand the first question.
In my code I sample the mask manually. Would that work of what would you like to sample additionally?

If you enable dropout during training and disable it during evaluation, the expected activation values will have a different range and thus your model will most likely perform poorly.
The scaling is described in the original Dropout paper.

I want to calculate mean frame with taking into account zero padding and replace some random frames with it.
This code raises an error.
What is the right way?

def mean_frame_dropout(x, lens, p=0.2):
    x = x.clone()
    batch_size, length, features = x.size()
    mask = torch.distributions.Bernoulli(
        probs=(1 - p)).sample((batch_size, length)).to(x.device)
    mask = ~mask.bool()
    mask = mask.unsqueeze(-1).expand(-1, -1, features)
    mean = x.detach().sum(1) / lens
    x[mask] = mean
    return x

What kind of error are you getting? Could you post the stack trace as well as the shapes of the input tensors?

Mean and x sizes
torch.Size([1, 128])
torch.Size([1, 167, 128])

RuntimeError: shape mismatch: value tensor of shape [128] cannot be broadcast to indexing result of shape [5120]

The mean tensor has the shape [1, 128] as it’s calculated by x.sum(1).
However, x[mask] has a variable number of elements so how should the assignment work?
E.g. lets assume mask samples 5120 True values, thus x[mask] would have the shape [5120].
How should the 128 values of mean be assigned to which value?

What is the right way to mask it?

It depends, what you want to achieve.
The original mask before the unsqueeze and expand operations can be used to index x directly, which would yield:

x = torch.ones([1, 167, 128])
batch_size, length, features = x.size()
p = 0.5
mask = torch.distributions.Bernoulli(
    probs=(1 - p)).sample((batch_size, length)).to(x.device)
mask = ~mask.bool()
print(mask.shape)
> torch.Size([1, 167])
print(x[mask].shape)
> torch.Size([81, 128])
print(mask.sum())
> tensor(81)

So would you like to replace all these 81 values in dim1 with the sum of them?

Should you advice, how to replace some block of channels with mean values?
And perform it independly for each sample in batch.
Im not sure how to select indexes over batch and len axis at the same time.

For x = torch.rand(32, 1024, 128)

def mean_freq_mask(x, p=0.2):
	if torch.rand(1) < p:
		x = x.clone()
		batch_size, length, features = x.size()
		F = features // 3
		mean = x.detach().mean(2)
		mask_len = (torch.rand(batch_size)*F).long()
		mask_start = (torch.rand(batch_size)*(features-F)).long()
		x[mask_start:mask_start+mask_len] = mean.unsqueeze(-1).expand(-1, -1, features)[mask_start:mask_start+mask_len]
	return x

This gives error only integer tensors of a single element can be converted to an index

What is the correct way to use this block inside a model class?
Let’s say I have the following CNN:

cfg = {

    'VGG16': [64, 'Dp', 64, 'M', 128, 'Dp', 128, 'M', 256, 'Dp', 256, 'Dp', 256, 'M', 512,'Dp', 512,'Dp', 512, 'M', 512,'Dp', 512,'Dp', 512, 'A', 'Dp'], #dropouts dependent from a single parameter (useful for hyper-par optim.) 
    
}


class VGG(nn.Module, NetVariables, OrthoInit):
    def __init__(self, params):
        

        self.params = params.copy()
        

        nn.Module.__init__(self)
        NetVariables.__init__(self, self.params)
        OrthoInit.__init__(self)
        
     
        self.features = self._make_layers(cfg['VGG16'])
        self.classifier = nn.Linear(512, self.num_classes)



        self.weights_init() #call the orthogonal initial condition
        


    def forward(self, x):
        outs = {} 
        L2 = self.features(x)
        outs['l2'] = L2
        Out = L2.view(L2.size(0), -1)
        Out = self.classifier(Out)
        outs['out'] = Out
        return outs

    def _make_layers(self, cfg):
        layers = []
        in_channels = 3
        for x in cfg:
            if x == 'M':
                layers += [nn.MaxPool2d(kernel_size=2, stride=2)]
                
            elif x=='A':
                layers += [nn.AvgPool2d(kernel_size=2, stride=2)]
                
            elif x == 'D3':
                layers += [nn.Dropout(0.3)]

            elif x == 'D4':
                layers += [nn.Dropout(0.4)]            

            elif x == 'D5':
                layers += [nn.Dropout(0.5)]   
                
            elif x == 'Dp':
                layers += [nn.Dropout(self.params['dropout_p'])] 
                
            else:
                layers += [nn.Conv2d(in_channels, x, kernel_size=3, padding=1),

                           nn.Tanh()
                           ,nn.GroupNorm(int(x/self.params['group_factor']), x)
                           ]
                in_channels = x
        layers += [nn.AvgPool2d(kernel_size=1, stride=1)]
        return nn.Sequential(*layers) 

I’d like to substitute the dropout layer with the block you proposed, giving each time I forward the mask as input; how should I do it?

If you want to pass the mask as an additional argument to the forward method of your custom Dropout layer, you should write this logic into the forward method of VGG explicitly and remove the usage of the nn.Sequential block as it expects a single input/output in the default implementation.

1 Like

Is there some other way around to keep the code more flexible and clean; in the above solution with nn.Sequential I can easly modify the architecture ‘chain’ of modules directly from cfg dict.
Can I do something similar without nn.Sequential?

You could try to use an nn.ModuleList or nn.ModuleDict, add the layers using your cfg, and iterate it in the forward method.

1 Like