Input to torch.distribution.categorical.Categorical()

I was wondering what this sentence means in the documentation of Categorical:
“Creates a categorical distribution parameterized by either probs or logits(but not both).” This means we can feed Categorical with logits or probs (output of softmax for example) and in both cases, we get the same results?
In my implementation, I am experiencing something strange. I implement a simple network in two different ways:

def mlp(sizes, activation=nn.Tanh, output_activation=nn.Identity):
# Build a feedforward neural network. outputs are the logits
layers = []
for j in range(len(sizes)-1):
act = activation if j < len(sizes)-2 else output_activation
layers += [nn.Linear(sizes[j], sizes[j+1]), act()]
return nn.Sequential(*layers)


class mlp2(torch.nn.Module):
def init(self):
In the constructor we instantiate two nn.Linear modules and assign them as
member variables.
super(mlp2, self).init()
self.linear1 = nn.Linear(10,100)
self.relu1 = nn.ReLU(inplace=True)
self.linear2 = torch.nn.Linear(100,20)
self.linear3 = torch.nn.Linear(2000,100)
self.ident = nn.Identity()

def forward(self, x):
    In the forward function we accept a Tensor of input data and we must return
    a Tensor of output data. We can use Modules defined in the constructor as
    well as arbitrary operators on Tensors.
    a = self.linear1(x)
    a = self.relu1(a)
    a = self.linear2(a)
    #print('a before flatten', a.shape)
    a = torch.flatten(a)
    #print('a after flatten', a.shape)
    a = self.linear3(a)
    a = self.relu1(a)
    a = self.linear2(a)
    out = self.ident(a)
    return out

Then I have this code for the first network:

make core of policy network

logits_net = mlp(sizes=[obs_dim]+hidden_sizes+[n_acts])#sizes = [10, 32, 20]

# make function to compute action distribution
def get_policy(obs):
    #convert observation in the form [100,10] to [100,1,10]?
    logits = logits_net(obs)
    #get the mean of the output so we have a vector of size 20
    return Categorical(logits=logits.mean(0))

# make action selection function (outputs int actions, sampled from policy)
def get_action(obs):
    return get_policy(obs).sample().item()

#log probability of the action smapled 
def Logp(obs, act):
    logp = get_policy(obs).log_prob(act)
    return logp

What I do is, I get the mean of logits, which will be a vector of the size [20] with negative values included and then feed it to categorical. This code works.

For the second network, l don’t need to get the mean of logits as it already has size 20 but I get an error when feeding it to categorical (here, logits contain negative values too). This was the error:


> RuntimeError: invalid argument 2: invalid multinomial distribution (encountering probability entry < 0) at ..\aten\src\TH/generic/THTensorRandom.cpp:325

I wondered why I am facing this issue when my logits in both cases has negative values? I searched and noticed people use softmax and then give the output of softmax to Categorical. Now according to the documentation, the input to categorical can be logits or probs. Why in my case, once it works and the other time it doesn’t? My networks in both samples have the same architecture.


1 Like

If you don’t specify logits= in the creation of the distribution, the input will be used as probs by default, as it’s the first argument as seen in the docs.


x = torch.tensor([-1., -2., -1., 2.])
c = torch.distributions.categorical.Categorical(x)
c.sample() # error

c = torch.distributions.categorical.Categorical(logits=x)
c.sample() # works
1 Like