Applying non recurrent layers to PackedSequences

rajatvd · May 29, 2018, 1:50pm

When trying to train seq2seq or other models based on RNNs, we usually have layers like an embedding before the RNN cell and a Dense layer at the output to obtain a final output of a desired size at each time step. These layers are time distributed, as in their weights are the same for each time step.

To implement this for batches of variable length sequences, we usually pad the sequences and keep a track of each sequence length, so that we can mask the outputs of these layers accordingly.

However, I have seen that pytorch has a util to create PackedSequences objects which can be fed directly into recurrent modules like LSTMs and it returns the output as a PackedSequence as well.

To my knowledge, we can’t provide a PackedSequence as input to a non-recurrent layer like an Embedding. But, since these layers are time distributed, wouldn’t the application of them simply be applying the layer to each element in the PackedSequence.data tensor, while retaining the same set of batch_sizes?

Currently we can implement this by doing something like:

embedding = nn.Embedding(num_embeddings, embedding_size)
rnn = nn.RNN(input_size=embedding_size,hidden_size=hidden_size)
out_linear = nn.Linear(hidden_size, output_size)

def forward(input,hidden):
    # input is a PackedSequence
    batch_sizes = input.batch_sizes
    
    embedded = embedding(input.data) # apply the layer to the data tensor
    
    # make a new packed sequence with the outputs
    rnn_in = nn.utils.rnn.PackedSequence(embedded,batch_sizes)
    
    # feed to rnn
    rnn_out,hidden = rnn(rnn_in,hidden)
    
    # apply output linear layer
    outputs = out_linear(rnn_out.data)

    # again, make a new packed sequence to return
    final_output = nn.utils.rnn.PackedSequence(outputs,batch_sizes)
    return final_output

However, to do this we have to manually create an instance of a PackedSequence, which is not recommended. Is there any other way to go about this?