I am working on Semantic Role Labeling, I have tensors which are the result of concatenating the BERT embedding of a word and a bit which indicates where the predicate is in the sentence, so I have tensors of shape (batch_size, sequence_len, bert_embedding+1). I pass these to a dropout layer but the fact that the indicator bit will be zeroed is not desirable. Is there a way to tell dropout not to apply to that index?
Thanks in advance.
There is no option for the
nn.Dropout layer or the functional
However, you could simply create a mask using
torch.bernoulli and create a custom dropout layer.
Here is a simple exmaple:
x = torch.randn(10)
p = 0.5
probs = torch.cat((torch.ones(2,), torch.tensor([p]*8))) # don't zero out first 2 values
mask = torch.bernoulli(probs)
out = x * mask
Don’t forget to add the scaling during training or validation as is done for the vanilla dropout. Otherwise the expected values will differ when you disable your dropout and the model might perform bad.
regarding the scaling, is that something that is done in the nn.Dropout layer?
Yes, this is done during training with a scale factor of
1/(1-p) as seen here:
drop = nn.Dropout(p=0.8)
x = torch.ones(10)
out = drop(x)
> tensor([0., 0., 0., 0., 5., 5., 0., 0., 0., 0.])
The dropout paper explains the scaling in section 10.