class FC(nn.Module):
def __init__(self, input_size, hidden_size, num_classes):
super(FC, self).__init__()
self.d1 = nn.Dropout(0.5)
self.h1 = nn.Linear(input_size, hidden_size)
self.h2 = nn.Linear(hidden_size, num_classes)
def forward(self, x):
x = self.d1(x)
x = self.h1(x)
x = F.relu(x)
X = self.h2(x)
return x
and I provide an instance of this network to the train_model() function I mentioned. But when I print the output it gives during the training phase (code line outputs = model(inputs)), it gives me numbers greater than 1, which is strange since I’m expecting to make a softmax classifier which gives me a probability distribution. Can someone please help me with this?!
Are you also using nn.CrossEntropyLoss() as in the tutorial? It uses softmax (log softmax to be more precise) inside the loss, so there is no need for a softmax at the end of your model.
Therefore, the output of your model is a nn.Linear layer, which can produce unbounded negative and positive numbers.
When computing accuracy later on, you can use argmax (torch.max() returns indices as well as values) or torch.topk() to get the prediction(s).
Yes! I’m using nn.CrossEntropyLoss as in the tutorial. I see, since it’s a log softmax, the output is not strictly a probability distribution, right? That’s why I’m getting values greater than 1?
I mean, in the tutorial, in this block of code
with torch.set_grad_enabled(phase == 'train'):
outputs = model(inputs)
print(outputs)
_, preds = torch.max(outputs, 1)
loss = criterion(outputs, labels)
The print returns an array with values greater than 1… That’s why I’m confused.
The output of the model will indeed not be a probability distribution, it’s simply a linear function of the layer before (which is also linear in this case).
The softmax will transform this linear output into a probability distribution: more reading below.