I’ll try to help
I am starting with multinomial (as it is a more straightforward part).
When you have a trained language model, you need to have a strategy of sampling new sentences. The easiest way is to get a token with maximum likelihood - I assume that you already did it with max(). But this scenario has a significant drawback: your decoding is greedy. At every decoding/sampling step, the only possible choice is the top token, resulting in deterministic behavior and inability to sample anything else. To add some variance to the results, we can randomize the next word.
Imagine at timestep t we have the following predictions:
Max approach will always select “a”. If we want to give slightly less likely tokens a chance to appear, we can select randomly. But not in a way when each token has the same probabiliy. We want to sample them with relative likelihoods. That’s why we use multinomial function. It might be easier if you are familiar with numpy choice function with p parameter used: https://numpy.org/doc/stable/reference/random/generated/numpy.random.Generator.choice.html#numpy.random.Generator.choice
It allows us to generate a different sentence every time but keeping relative probability.
(there are also more sophisticated methods like beam search).
Example code also uses a concept of temperature. Basically, it is a calibration(?) of given probabilities. If the temperature is low, the sampler will be more conservative - sticking more to those very likely tokens. On the other hand, if the temperature is high - less likely tokens will have a relatively higher likelihood - therefore, there is a bigger chance they will be selected. You can somewhat expect that low temperatures will produce “boring but correct” results and high “interesting with errors” in practice.
For more on temperature, please check: https://stackoverflow.com/questions/58764619/why-should-we-use-temperature-in-softmax
Hope it helps ^^