Sampling new text from Pytorch language model example


I recently learning deep learning and was experimenting with the language model example provide by Pytorch here -

In the script here -, I don’t understand how we get the output word from all the word weights by sampling from a multinomial distribution.

        output, hidden = model(input, hidden)
        word_weights = output.squeeze().data.div(args.temperature).exp().cpu()
        word_idx = torch.multinomial(word_weights, 1)[0]
        word = corpus.dictionary.idx2word[word_idx]

If I assume the word_weights as the probabilities of all words, then we should pick the word with the highest probability (thinking about softmax here) . But, I could not understand logically, what’s the benefit/reason behind sampling from a multinomial distribution.

I played around a bit and tried to sample 2 word indices and print their corresponding word_weights and noticed that we don’t necessarily take the word with the higher weight (due to sampling). But I don’t understand the reason behind it.

I understand that this is not necessarily a Pytorch question, but would appreciate if someone could share details behind the sampling.

1 Like

A guess would be that it makes the output richer. Sticking to the most probable words would restrict the model to always use the most commonly used words, while if you use softmax, it should end up using the words approximately as often as they appear in natural language (so it will sometimes insert some more complex ones too).