Optimize batch processing of variable-length inputs

I noticed that it is not possible to feed packed_sequences to things like activation functions or linear layers, forcing me to design models having a forward method like :

    def forward(self, input, lengths, hidden = None):
        input = nn.utils.rnn.pack_padded_sequence(input, lengths, batch_first = True, enforce_sorted = False)
        out, hidden = self.lstm(input,hidden)
        out = nn.utils.rnn.pad_packed_sequence(out, batch_first = True, padding_value= -100)[0]
        out = self.drop_layer(self.sigmoid(out))
        out = self.softmax(self.linear_layer(out))

Where I have to unpack the packed_sequence directly after the passing through LSTM, and later need to filter it before loss calculation. if I don’t, I received an error message :

TypeError: sigmoid(): argument 'input' (position 1) must be Tensor, not PackedSequence

This seem highly inefficient since a lot of the calculations done by the activations functions and linear layers will have to be thrown away afterwards.

Why is that so ? What is the reason preventing us from passing packed_sequences to activation function or linear layers ?