How to deal with dropout in between LSTM layers when using PackedSequence?

Hi! I’m creating an LSTM Autoencoder for feature extraction for my master’s thesis. However, I’m having a lot of trouble with combining dropout with LSTM layers.

Since it’s an Autoencoder, I’m having a bottleneck which is achieved by having two separate LSTM layers, each with num_layers=1, and a dropout in between. I have time series with very different lengths and have found packed sequences to be a good idea for that reason. But, from my experiments, I must pack the data before the first LSTM, unpack before the dropout, then pack again before the second LSTM. This seems wildly inefficient. Is there a better way? I’m providing some example code and an alternative way to implement it below.

Current, working, but possibly suboptimal solution:

class Encoder(nn.Module):

    def __init__(self, seq_len, n_features, embedding_dim, hidden_dim, dropout):
        super(Encoder, self).__init__()

        self.seq_len = seq_len
        self.n_features = n_features
        self.embedding_dim = embedding_dim
        self.hidden_dim = hidden_dim

        self.lstm1 = nn.LSTM(
            input_size=n_features,
            hidden_size=self.hidden_dim,
            num_layers=1,
            batch_first=True,
        )

        self.lstm2 = nn.LSTM(
            input_size=self.hidden_dim,
            hidden_size=embedding_dim,
            num_layers=1,
            batch_first=True,
        )

        self.drop1 = nn.Dropout(p=dropout, inplace=False)

    def forward(self, x):
        x, (_, _) = self.lstm1(x)
        x, lens = pad_packed_sequence(x, batch_first=True, total_length=self.seq_len)
        x = self.drop1(x)
        x = pack_padded_sequence(x, lens, batch_first=True, enforce_sorted=False)
        x, (hidden_n, _) = self.lstm2(x)

        return hidden_n.reshape((-1, self.n_features, self.embedding_dim)), lens

Alternative, possibly better, but currently not working solution;

class Encoder2(nn.Module):

    def __init__(self, seq_len, n_features, embedding_dim, hidden_dim, dropout):
        super(Encoder2, self).__init__()

        self.seq_len = seq_len
        self.n_features = n_features
        self.embedding_dim = embedding_dim
        self.hidden_dim = hidden_dim

        self.lstm1 = nn.LSTM(
            input_size=n_features,
            hidden_size=self.hidden_dim,
            num_layers=2,
            batch_first=True,
            dropout=dropout,
            proj_size=self.embedding_dim,
        )

    def forward(self, x):
        _, (h_n, _) = self.lstm1(x)
        return h_n[-1].unsqueeze(1), lens

Any help and tips about working with time-series, packed sequences, lstm-cells and dropout would be immensely appreciated, as I’m not finding much documentation/guidance elsewhere on the internet. Thank you!

Best, Lars Ankile

PS. Is “uncategorized” the right category?

The first question I would have here is whether the first implementation does what you want.
The reason I am asking is that while this resembles what people did, I have the impression that this method fell out of favour and people started to use variational dropout (Gal and Ghamarani) where the dropout mask is sampled once for batch x features and multiplied to all timesteps and which has an interpretation as dropping values from the weights.

If you just want the style implemented by the first example, you could just apply dropout to the packed_seq.values and construct a new PackedSequence with that and all other items taken from the original packed sequence (yeah, it say you should not construct it, but well…).

If that is not what you want, probably sampling batch x features and then having a for loop

pos = 0
for i, bs in enumerate(packed_seq.batch_sizes):
    values_new[pos: pos + bs] = values_old[pos: pos + bs] * dropout_mask[:bs]
    pos += bs

or so would give you that. Because it is a common idea: The Thomas rule of thumb says: If your hidden_dim has >= three digits, I would not worry about the for loop until you find it to be a bottleneck.

Best regards

Thomas

Thanks, Thomas! Those are very good considerations to incorporate!