How to dropout with non zero value?

hadaev8 · May 3, 2020, 6:18pm

I want to use feature dropout like dropout2d and fill it with mean value (or Gaussian noise for example) instead of zeros.
How to do so?
Easiest thing to do is runing dropout2d and fill zeros, but i have zeros in data.

ptrblck · May 4, 2020, 5:30am

You could sample a binary mask using the drop probability p and fill the tensor with your desired value:

x = torch.ones(10, 20)

p = 0.5
mask = torch.distributions.Bernoulli(probs=(1-p)).sample(x.size())
x[~mask.bool()] = x.mean()

Note that dropout scales also the activation to make sure the expected activation ranges are equal during training and validation via:

# manual dropout
out = x * mask * 1/(1-p)

So you would have to take care of this scaling for your approach.

hadaev8 · May 9, 2020, 10:39am

Can i sample values from tensor for mask values?
Also do i really need to scale, if dropping out with non zero values?

ptrblck · May 9, 2020, 7:07pm

I’m not sure I understand the first question.
In my code I sample the mask manually. Would that work of what would you like to sample additionally?

If you enable dropout during training and disable it during evaluation, the expected activation values will have a different range and thus your model will most likely perform poorly.
The scaling is described in the original Dropout paper.

hadaev8 · October 19, 2020, 4:37pm

I want to calculate mean frame with taking into account zero padding and replace some random frames with it.
This code raises an error.
What is the right way?

def mean_frame_dropout(x, lens, p=0.2):
    x = x.clone()
    batch_size, length, features = x.size()
    mask = torch.distributions.Bernoulli(
        probs=(1 - p)).sample((batch_size, length)).to(x.device)
    mask = ~mask.bool()
    mask = mask.unsqueeze(-1).expand(-1, -1, features)
    mean = x.detach().sum(1) / lens
    x[mask] = mean
    return x

ptrblck · October 20, 2020, 1:53am

What kind of error are you getting? Could you post the stack trace as well as the shapes of the input tensors?

hadaev8 · October 20, 2020, 11:31am

Mean and x sizes
torch.Size([1, 128])
torch.Size([1, 167, 128])

RuntimeError: shape mismatch: value tensor of shape [128] cannot be broadcast to indexing result of shape [5120]

ptrblck · October 20, 2020, 7:00pm

The mean tensor has the shape [1, 128] as it’s calculated by x.sum(1).
However, x[mask] has a variable number of elements so how should the assignment work?
E.g. lets assume mask samples 5120 True values, thus x[mask] would have the shape [5120].
How should the 128 values of mean be assigned to which value?

hadaev8 · October 20, 2020, 7:20pm

What is the right way to mask it?

ptrblck · October 20, 2020, 7:26pm

It depends, what you want to achieve.
The original mask before the unsqueeze and expand operations can be used to index x directly, which would yield:

x = torch.ones([1, 167, 128])
batch_size, length, features = x.size()
p = 0.5
mask = torch.distributions.Bernoulli(
    probs=(1 - p)).sample((batch_size, length)).to(x.device)
mask = ~mask.bool()
print(mask.shape)
> torch.Size([1, 167])
print(x[mask].shape)
> torch.Size([81, 128])
print(mask.sum())
> tensor(81)

So would you like to replace all these 81 values in dim1 with the sum of them?

hadaev8 · October 23, 2020, 1:06am

Should you advice, how to replace some block of channels with mean values?
And perform it independly for each sample in batch.
Im not sure how to select indexes over batch and len axis at the same time.

For x = torch.rand(32, 1024, 128)

def mean_freq_mask(x, p=0.2):
	if torch.rand(1) < p:
		x = x.clone()
		batch_size, length, features = x.size()
		F = features // 3
		mean = x.detach().mean(2)
		mask_len = (torch.rand(batch_size)*F).long()
		mask_start = (torch.rand(batch_size)*(features-F)).long()
		x[mask_start:mask_start+mask_len] = mean.unsqueeze(-1).expand(-1, -1, features)[mask_start:mask_start+mask_len]
	return x

This gives error only integer tensors of a single element can be converted to an index

Torcione · January 1, 2022, 1:43pm

What is the correct way to use this block inside a model class?
Let’s say I have the following CNN:

cfg = {

    'VGG16': [64, 'Dp', 64, 'M', 128, 'Dp', 128, 'M', 256, 'Dp', 256, 'Dp', 256, 'M', 512,'Dp', 512,'Dp', 512, 'M', 512,'Dp', 512,'Dp', 512, 'A', 'Dp'], #dropouts dependent from a single parameter (useful for hyper-par optim.) 
    
}


class VGG(nn.Module, NetVariables, OrthoInit):
    def __init__(self, params):
        

        self.params = params.copy()
        

        nn.Module.__init__(self)
        NetVariables.__init__(self, self.params)
        OrthoInit.__init__(self)
        
     
        self.features = self._make_layers(cfg['VGG16'])
        self.classifier = nn.Linear(512, self.num_classes)



        self.weights_init() #call the orthogonal initial condition
        


    def forward(self, x):
        outs = {} 
        L2 = self.features(x)
        outs['l2'] = L2
        Out = L2.view(L2.size(0), -1)
        Out = self.classifier(Out)
        outs['out'] = Out
        return outs

    def _make_layers(self, cfg):
        layers = []
        in_channels = 3
        for x in cfg:
            if x == 'M':
                layers += [nn.MaxPool2d(kernel_size=2, stride=2)]
                
            elif x=='A':
                layers += [nn.AvgPool2d(kernel_size=2, stride=2)]
                
            elif x == 'D3':
                layers += [nn.Dropout(0.3)]

            elif x == 'D4':
                layers += [nn.Dropout(0.4)]            

            elif x == 'D5':
                layers += [nn.Dropout(0.5)]   
                
            elif x == 'Dp':
                layers += [nn.Dropout(self.params['dropout_p'])] 
                
            else:
                layers += [nn.Conv2d(in_channels, x, kernel_size=3, padding=1),

                           nn.Tanh()
                           ,nn.GroupNorm(int(x/self.params['group_factor']), x)
                           ]
                in_channels = x
        layers += [nn.AvgPool2d(kernel_size=1, stride=1)]
        return nn.Sequential(*layers)

I’d like to substitute the dropout layer with the block you proposed, giving each time I forward the mask as input; how should I do it?

ptrblck · January 1, 2022, 10:44pm

If you want to pass the mask as an additional argument to the forward method of your custom Dropout layer, you should write this logic into the forward method of VGG explicitly and remove the usage of the nn.Sequential block as it expects a single input/output in the default implementation.

Torcione · January 1, 2022, 10:57pm

Is there some other way around to keep the code more flexible and clean; in the above solution with nn.Sequential I can easly modify the architecture ‘chain’ of modules directly from cfg dict.
Can I do something similar without nn.Sequential?

ptrblck · January 1, 2022, 10:59pm

You could try to use an nn.ModuleList or nn.ModuleDict, add the layers using your cfg, and iterate it in the forward method.