Trying to understand the torch.multinomial

Can someone please explain what torch.multinomial does?

With numpy’s np.random.multinomial it is more or less clear: it take the number of trials (experiments), the probabilities for each outcome (which sum up to 1) and the size of the output (i.e. the number of drawn samples). So for example, if we the experiment is to throw a six-sided fair dice and we do it n=20 times, then

>>> np.random.multinomial(20, [1/6.]*6, size=1)
array([[4, 1, 7, 5, 2, 1]])

meaning that the dice landed 4 times on 1, once on 2, etc.

How can we do the same thing with torch.multinomial?

I found the following example which unfortunately doesn’t make much sense to me:

>>> weights = torch.tensor([0, 10, 3, 0], dtype=torch.float)
>>> torch.multinomial(weights, 2)
tensor([1, 2])
>>> torch.multinomial(weights, 4) # ERROR!

How do the weights translate to probabilities? What is the meaning of tensor([1, 2]) and why the second example gives error?

By default torch.multinomial will use replacement=False.
Since only two weights are positive in your example, you won’t be able to draw 4 samples with this setup.
Using torch.multinomial(weights, 4, replacement=True) will work on the other hand.

The returned tensor will give you the samples indices. In your code example you would sample index1 and index2.

What I don’t understand is how can I get the same thing as np.random.multinomial(20, [1/6.]*6, size=1) using torch.multinomial.

If I try

>>> weights = torch.tensor([1/6.]*6, dtype=torch.float)
>>> torch.multinomial(weights, 6)
tensor([4, 2, 1, 0, 5, 3])

It has a different interpretation than the numpy’s result (see my dice example): the pytorch returns shuffled indices (?) which doesn’t make much sense to me.

Also, I would like to understand how the weights w1, w2, … (e.g. weights [0, 10, 3, 0]) translate to probabilities p1, p2, … in multinomial distribution:

multinomial

3 Likes

torch.multinomial will return the drawn indices, while numpy returns the count of the drawn samples.
To get the count, you could use unique(return_counts=True):

weights = torch.tensor([1/6.]*6, dtype=torch.float)
out = torch.multinomial(weights, 20, replacement=True)
out_count = out.unique(return_counts=True)[1]

The passed weights argument will be normalized to create a probability distribution.

2 Likes

Thanks ptrblck! I also found that torch.distributions.Multinomial does what I want more directly:

m = torch.distributions.Multinomial(20, torch.tensor([1/6]*6))
m.sample()

I am still not sure how to work with weights (not probabilities) though. Using the previous example weights = torch.tensor([0, 10, 3, 0], dtype=torch.float) we have w1 = 0, w2 = 10, w3 = 3, w4 = 0. Are the probabilities p1 = 0, p2 = 10/13, p3 = 3/13, p4 = 0 in this case?

2 Likes

Yes, I think your understanding is correct.
If I’m not mistaken, the weights are simply normalized such that their sum equals to 1.