Spatial Dropout in Pytorch

Skinish · July 19, 2018, 7:06pm

I’ve always found really useful to apply SpatialDropout1d (https://keras.io/layers/core/#spatialdropout1d) in Keras, after an Embedding layer.

Is it possible to achieve the same effect on Pytorch?

SimonW · July 20, 2018, 2:42pm

We currently have Dropout2d and Dropout3d, which does similar thing but for 2d and 3d inputs. We plan to support more general feature dropouts soon. In fact, you can see it mostly done in this PR, but was eventually blocked by something that needs to be fixed: https://github.com/pytorch/pytorch/pull/9008

ivank · February 3, 2019, 7:16pm

I recently ran into the same problem, just want to mention one gotcha - noise masks computed by these Keras and PyTorch dropout functions are different:

In both Keras and PyTorch after applying embedding on [batch, time] sequence you get [batch, time, channels] tensor.

Keras’ SpatialDropout1D applies [*, 1, *] noise mask - i.e. drops out a subset of channels for all timestamps simultaneously, whereas PyTorch’s Dropout*D uses [*, *, 1] mask - it drops out all channels for a subset of timestamps.

So, if you just naively replace SpatialDropout1D() by nn.Dropout2d() it’ll work but with changed semantics - instead of dropping out whole embedding channels you’ll be dropping out whole words (in NLP case), which works less well in my experience.

Something like this code will get you an equivalent of Keras’ SpatialDropout1D in PyTorch:

x = x.permute(0, 2, 1)   # convert to [batch, channels, time]
x = F.dropout2d(x, p, training=self.training)
x = x.permute(0, 2, 1)   # back to [batch, time, channels]

YU_Jason · March 24, 2019, 10:24am

Thanks man! It works !