# Error ion categorical multi sample

I am trying to sample from a variable so I can apply the reinforce algorithm on a toy problem.

What I found was that sampling more than one scalar from a probability distribution yields spurious errors when the log probability is being computed.

Here is an example:

``````
x = Variable(torch.Tensor([[0.1, 0.2, 0.1, 0.25, 0.25, 0.1]]), requires_grad=True)
print(x.size())
m = Categorical(x)
action = m.sample_n(5)
print('action: ', action.size())
# next_state, reward = env.step(action)
loss = -m.log_prob(action.unsqueeze(0)) #* reward
print('loss: ', loss)
loss.backward()
``````

We get

``````torch.Size([1, 6])
action:  torch.Size([5, 1])
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-115-d59bfdfba3ea> in <module>()
6 print('action: ', action.size())
7 # next_state, reward = env.step(action)
----> 8 loss = -m.log_prob(action.unsqueeze(0)) #* reward
9 print('loss: ', loss)
10 loss.backward()

~/anaconda3/envs/py35/lib/python3.5/site-packages/torch/distributions.py in log_prob(self, value)
151             return p.gather(-1, value).log()
152
--> 153         return p.gather(-1, value.unsqueeze(-1)).squeeze(-1).log()
154
155

RuntimeError: invalid argument 4: Index tensor must have same dimensions as input tensor at /opt/conda/conda-bld/pytorch_1512383260527/work/torch/lib/TH/generic/THTensorMath.c:503
``````

I looked into the source code for Categorical and nothing seems out of place.

Solving the reinforce problem by meself yields no errors:

``````
m = Categorical(x)
action = m.sample_n(5)
p = -x/x.sum(-1, keepdim=True)
reward = 1
loss = p.log()* reward
print('loss ', loss.size())
loss.mean().backward()
``````

Output:

``````
action:  torch.Size([5, 1])
loss  torch.Size([1, 6])
``````