You could consider generating batches with sequences of the same length. I use it all the time for sequence classification but also for seq2seq models. You may want to have a look here, here and here.