I was wondering what this sentence means in the documentation of Categorical:
“Creates a categorical distribution parameterized by either
logits(but not both).” This means we can feed Categorical with logits or probs (output of softmax for example) and in both cases, we get the same results?
In my implementation, I am experiencing something strange. I implement a simple network in two different ways:
def mlp(sizes, activation=nn.Tanh, output_activation=nn.Identity):
# Build a feedforward neural network. outputs are the logits
layers = 
for j in range(len(sizes)-1):
act = activation if j < len(sizes)-2 else output_activation
layers += [nn.Linear(sizes[j], sizes[j+1]), act()]
In the constructor we instantiate two nn.Linear modules and assign them as
self.linear1 = nn.Linear(10,100)
self.relu1 = nn.ReLU(inplace=True)
self.linear2 = torch.nn.Linear(100,20)
self.linear3 = torch.nn.Linear(2000,100)
self.ident = nn.Identity()
def forward(self, x): """ In the forward function we accept a Tensor of input data and we must return a Tensor of output data. We can use Modules defined in the constructor as well as arbitrary operators on Tensors. """ a = self.linear1(x) a = self.relu1(a) a = self.linear2(a) #print('a before flatten', a.shape) a = torch.flatten(a) #print('a after flatten', a.shape) a = self.linear3(a) a = self.relu1(a) a = self.linear2(a) out = self.ident(a) return out
Then I have this code for the first network:
make core of policy network
logits_net = mlp(sizes=[obs_dim]+hidden_sizes+[n_acts])#sizes = [10, 32, 20] # make function to compute action distribution def get_policy(obs): #convert observation in the form [100,10] to [100,1,10]? logits = logits_net(obs) #get the mean of the output so we have a vector of size 20 return Categorical(logits=logits.mean(0)) # make action selection function (outputs int actions, sampled from policy) def get_action(obs): return get_policy(obs).sample().item() #log probability of the action smapled def Logp(obs, act): logp = get_policy(obs).log_prob(act) return logp
What I do is, I get the mean of logits, which will be a vector of the size  with negative values included and then feed it to categorical. This code works.
For the second network, l don’t need to get the mean of logits as it already has size 20 but I get an error when feeding it to categorical (here, logits contain negative values too). This was the error:
> RuntimeError: invalid argument 2: invalid multinomial distribution (encountering probability entry < 0) at ..\aten\src\TH/generic/THTensorRandom.cpp:325
I wondered why I am facing this issue when my logits in both cases has negative values? I searched and noticed people use softmax and then give the output of softmax to Categorical. Now according to the documentation, the input to categorical can be logits or probs. Why in my case, once it works and the other time it doesn’t? My networks in both samples have the same architecture.