Thank you for your reply.
I got to work PackedSequences:
def forward(self, input, hidden, lengths):
embeddings = self.encoder(input)
packed = pack_padded_sequence(embeddings, lengths, batch_first=True)
output, hidden = self.rnn(packed, hidden)
output, _ = pad_packed_sequence(output, batch_first=True)
Now, I am confused how it is possible to apply linear decoder to only non-padded elements and the feed them to the loss function? Is there a "pytorch" proper way of doing it or the masking/padding is mandatory?