x = torch.randn(64, 10, 2, 30)
# (batch, some length, some observation, some feature dimension)
And I want to apply dropout at the observation dimension i.e either all 30 features of 0 or either all 30 values of 1.
I tried looking for it but I couldn’t find anything that could work or maybe I misunderstood some implementation.
x = torch.randn(64, 10, 2, 30)
N, C, _, L = x.size()
if torch.randint(0, 2, (1,)) == 0:
mask = torch.tensor([0., 1.]).view(1, 1, -1, 1).expand(N, C, -1, L)
else:
mask = torch.tensor([1., 0.]).view(1, 1, -1, 1).expand(N, C, -1, L)
and multiply it with the activation.
Note that you would have to scale the activations during evaluation in case you want to disable this masking (or with the inverse during training).
In that case, you could wrap this operation in a custom nn.Module and use the internal self.training flag to switch between the training and evaluation behavior.
and then multiply the data with it, one benefit of this I feel from the other approach would be I will not have to scale the activation, the F.dropout will do it for me.
Another is that this will act like a proper dropout where I might be able to get both the features and none of them as well
Oh, I might have misunderstood your use case and though you would like to keep one of the observations only. I.e. either zero out x[:, :, 0, :] completely or x[:, :, 1, :].
Your current approach would randomly zero out values in the observations dimension, wouldn’t it?
Sorry! Maybe I was not clear in explaining, I wanted dropout but not in the last dimension (dropping some values from the 30) i.e feature dimension but rather dropping out the all 30 values or keeping all the 30 value. Sorry, I didn’t know how to name it so I called it drop out in observation dimension. where I want drop out in the dimension where my value is 2, so the case will be like { (0, 0), (1, 0), (0, 1), (1, 1) } where 0 corresponds to dropping out all 30 features and 1 corresponds to keeping it.
I also tried dropout2D but I am unsure of its behaviour and the output had only two possibilities { (0,0), (1, 1) } and not the { (0, 1), (1, 0) } cases.