I noticed that it is not possible to feed packed_sequences
to things like activation functions or linear layers, forcing me to design models having a forward method like :
def forward(self, input, lengths, hidden = None):
input = nn.utils.rnn.pack_padded_sequence(input, lengths, batch_first = True, enforce_sorted = False)
out, hidden = self.lstm(input,hidden)
out = nn.utils.rnn.pad_packed_sequence(out, batch_first = True, padding_value= -100)[0]
out = self.drop_layer(self.sigmoid(out))
out = self.softmax(self.linear_layer(out))
Where I have to unpack the packed_sequence
directly after the passing through LSTM
, and later need to filter it before loss calculation. if I don’t, I received an error message :
TypeError: sigmoid(): argument 'input' (position 1) must be Tensor, not PackedSequence
This seem highly inefficient since a lot of the calculations done by the activations functions and linear layers will have to be thrown away afterwards.
Why is that so ? What is the reason preventing us from passing packed_sequences
to activation function or linear layers ?