Error ion categorical multi sample

I am trying to sample from a variable so I can apply the reinforce algorithm on a toy problem.

What I found was that sampling more than one scalar from a probability distribution yields spurious errors when the log probability is being computed.

Here is an example:


x = Variable(torch.Tensor([[0.1, 0.2, 0.1, 0.25, 0.25, 0.1]]), requires_grad=True)
print(x.size())
m = Categorical(x)
action = m.sample_n(5)
print('action: ', action.size())
# next_state, reward = env.step(action)
loss = -m.log_prob(action.unsqueeze(0)) #* reward
print('loss: ', loss)
loss.backward()

We get

torch.Size([1, 6])
action:  torch.Size([5, 1])
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-115-d59bfdfba3ea> in <module>()
      6 print('action: ', action.size())
      7 # next_state, reward = env.step(action)
----> 8 loss = -m.log_prob(action.unsqueeze(0)) #* reward
      9 print('loss: ', loss)
     10 loss.backward()

~/anaconda3/envs/py35/lib/python3.5/site-packages/torch/distributions.py in log_prob(self, value)
    151             return p.gather(-1, value).log()
    152 
--> 153         return p.gather(-1, value.unsqueeze(-1)).squeeze(-1).log()
    154 
    155 

RuntimeError: invalid argument 4: Index tensor must have same dimensions as input tensor at /opt/conda/conda-bld/pytorch_1512383260527/work/torch/lib/TH/generic/THTensorMath.c:503

I looked into the source code for Categorical and nothing seems out of place.

Solving the reinforce problem by meself yields no errors:


m = Categorical(x)
action = m.sample_n(5)
p = -x/x.sum(-1, keepdim=True)
reward = 1
loss = p.log()* reward
print('loss ', loss.size()) 
loss.mean().backward()

Output:


action:  torch.Size([5, 1])
loss  torch.Size([1, 6])

Could someone please help?