Let the LSTM process the padded sequence, but only backprop through the actual size. You can call you loss function in a loop over the batch size and slice your prediction to the sequence length of each item in the batch.
Let the LSTM process the padded sequence, but only backprop through the actual size. You can call you loss function in a loop over the batch size and slice your prediction to the sequence length of each item in the batch.