Is teacher forcing default for nn.lstm

It depends how the Teacher Forcing is implement. Yes, if you check the Pytorch Seq2Seq tutorial, Teacher Forcing is implement on a batch-by-batch basis (well, the batch is is just 1 here).

In principle, nobody is stopping you from implementing Teacher Forcing in a step-by-step basis. You just need to move the if use_teacher_forcing: condition into the inner loop for the time steps. I once tried it, and it works just fine. However, I have no idea about any theoretical or practical underpinnings which approach might be the better or worse one and for what reasons, sorry!

When it comes to using RNNs and batches with batch sizes greater than 1, things become a bit more tricky, particularly for Seq2Seq models where the target is also a sequence and the decoder loops over each time step. My common approach is to create batches where each batch contains only samples with same combination of input and target length. This means that in the decoder, the loop ends for all targets at the same time step and everything is just dandy :).

You can check out this older post to see if it helps. I actually just made an update to include my most recent implementation of a Sampler to create batch with all samples having equal lengths.

1 Like