# Simple neural network classification

Hello guys!

I’m trying to make a simple neural network, but I’m struggling to understand where the softmax gets in the pipeline. I’m using the tutorials function train_model() – which can be accessed here https://pytorch.org/tutorials/beginner/transfer_learning_tutorial.html --.

I created this simple NN :

``````        class FC(nn.Module):
def __init__(self, input_size, hidden_size, num_classes):
super(FC, self).__init__()
self.d1 = nn.Dropout(0.5)
self.h1 = nn.Linear(input_size, hidden_size)
self.h2 = nn.Linear(hidden_size, num_classes)

def forward(self, x):
x = self.d1(x)
x = self.h1(x)
x = F.relu(x)
X = self.h2(x)
return x
``````

and I provide an instance of this network to the train_model() function I mentioned. But when I print the output it gives during the training phase (code line `outputs = model(inputs)`), it gives me numbers greater than 1, which is strange since I’m expecting to make a softmax classifier which gives me a probability distribution. Can someone please help me with this?!

Are you also using `nn.CrossEntropyLoss()` as in the tutorial? It uses softmax (log softmax to be more precise) inside the loss, so there is no need for a softmax at the end of your model.

Therefore, the output of your model is a `nn.Linear` layer, which can produce unbounded negative and positive numbers.

When computing accuracy later on, you can use argmax (`torch.max()` returns indices as well as values) or `torch.topk()` to get the prediction(s).

Yes! I’m using `nn.CrossEntropyLoss` as in the tutorial. I see, since it’s a log softmax, the output is not strictly a probability distribution, right? That’s why I’m getting values greater than 1?

I mean, in the tutorial, in this block of code

``````with torch.set_grad_enabled(phase == 'train'):
outputs = model(inputs)
print(outputs)
_, preds = torch.max(outputs, 1)
loss = criterion(outputs, labels)
``````

The print returns an array with values greater than 1… That’s why I’m confused.

I can relate, I was also confused at first The output of the model will indeed not be a probability distribution, it’s simply a linear function of the layer before (which is also linear in this case).

The softmax will transform this linear output into a probability distribution: more reading below.

1 Like