Pack_padded_sequence and pad_packed_sequence


I want to use the Keras ‘masking layer’ equivalent in PyTorch. I came up with the ‘pack_padded_sequence’ and ‘pad_packed_sequence’ examples and I have 3 doubts.

My input data is of the shape (batch_size, seq_length, feat_dim) = (10, 63, 100). For each batch, I am executing the following code in my model’s ‘forward’ method.

seq_lens = np.count_nonzero(x.cpu().numpy(), axis=1)[:, 0]
pack = pack_padded_sequence(x.float(), -np.sort(-seq_lens), batch_first=True)
gru_out,h = self.Gru(pack)
unpacked, unpacked_len = pad_packed_sequence(gru_out, batch_first=True)

‘’ has dimension [256,100]. So my 1st doubt is how does the GRU understand the ‘time-steps’ from this data input?

For every batch, the actual (without zero padding) maximum sequence length varies. Is that permitted or will it create any issues?

From the ‘’ how do I recreate the new data [10, 63, new_feat_dim]? This is needed because my labels are also zero padded with size [10, 63].


that’s basically cudnn’s internal format, you don’t need to know it under normal circumstances

no major issues, besides some extra memory heap fragmentation

“unpacked” is a tensor that is already padded. don’t use the .data attribute there.