When trying to train seq2seq or other models based on RNNs, we usually have layers like an embedding before the RNN cell and a Dense layer at the output to obtain a final output of a desired size at each time step. These layers are time distributed, as in their weights are the same for each time step.
To implement this for batches of variable length sequences, we usually pad the sequences and keep a track of each sequence length, so that we can mask the outputs of these layers accordingly.
However, I have seen that pytorch has a util to create PackedSequences objects which can be fed directly into recurrent modules like LSTMs and it returns the output as a PackedSequence as well.
To my knowledge, we can’t provide a PackedSequence as input to a non-recurrent layer like an Embedding. But, since these layers are time distributed, wouldn’t the application of them simply be applying the layer to each element in the PackedSequence.data
tensor, while retaining the same set of batch_sizes
?
Currently we can implement this by doing something like:
embedding = nn.Embedding(num_embeddings, embedding_size)
rnn = nn.RNN(input_size=embedding_size,hidden_size=hidden_size)
out_linear = nn.Linear(hidden_size, output_size)
def forward(input,hidden):
# input is a PackedSequence
batch_sizes = input.batch_sizes
embedded = embedding(input.data) # apply the layer to the data tensor
# make a new packed sequence with the outputs
rnn_in = nn.utils.rnn.PackedSequence(embedded,batch_sizes)
# feed to rnn
rnn_out,hidden = rnn(rnn_in,hidden)
# apply output linear layer
outputs = out_linear(rnn_out.data)
# again, make a new packed sequence to return
final_output = nn.utils.rnn.PackedSequence(outputs,batch_sizes)
return final_output
However, to do this we have to manually create an instance of a PackedSequence, which is not recommended. Is there any other way to go about this?