nn.LSTM with nn.NLLLoss: confusion between batch size and sequence length

vdw · May 12, 2024, 4:13am

Yes, the sequence length is 5. The 3 stems from the 3 probabilities that you get for each token since you have 3 possible tags (in this simplified) example here. For example, if you replace

tag_scores = F.log_softmax(tag_space, dim=1)

with

tag_scores = F.softmax(tag_space, dim=1)

the 3 probabilities in each of the 5 rows should sum up to 1. The value in targets are 0, 1, or 2, reflecting with class/probability is the correct one for each token.

Side note: I’m note sure, but extending this tutorial to mini-batches might be tricky. This tutorial uses a lot of view() commands that might cause problems.