Understanding of output from the language modeling tutorial

dvijay · October 10, 2023, 11:00pm

I’m trying to understand the output from the model in the “language modeling with nn.transformer and torchtext” tutorial (Language Modeling with nn.Transformer and torchtext — PyTorch Tutorials 2.1.0+cu121 documentation). I am basically trying to run part of a text and trying to decode the response from the model in plain text.

line = "This game was a lot of "
line_t = torch.tensor(vocab(tokenizer(line)), dtype=torch.long)
output = model(line_t.to(device))
output_flat = output.view(-1, ntokens)

In this case the output shape is [6, 6, 28782] and flattened shape is [36, 28782]. 29782 is the vocabulary size. Since the input shape is [6] (for the six words), I’m not sure why the output shape is [6, 6, 28782]. How do I decode/translate this output into text? From what I understand, the model is expected to predict the next word but I’m not sure how to identify that from the above output.

I hope that makes sense. Any help is appreciated!