How to handle variable length inputs (sentences)

Let the LSTM process the padded sequence, but only backprop through the actual size. You can call you loss function in a loop over the batch size and slice your prediction to the sequence length of each item in the batch.