Simple neural network classification

Hello guys!

I’m trying to make a simple neural network, but I’m struggling to understand where the softmax gets in the pipeline. I’m using the tutorials function train_model() – which can be accessed here --.

I created this simple NN :

        class FC(nn.Module):
            def __init__(self, input_size, hidden_size, num_classes):
                super(FC, self).__init__()
                self.d1 = nn.Dropout(0.5)
                self.h1 = nn.Linear(input_size, hidden_size)
                self.h2 = nn.Linear(hidden_size, num_classes)

            def forward(self, x):
                x = self.d1(x)
                x = self.h1(x)
                x = F.relu(x)
                X = self.h2(x)
                return x

and I provide an instance of this network to the train_model() function I mentioned. But when I print the output it gives during the training phase (code line outputs = model(inputs)), it gives me numbers greater than 1, which is strange since I’m expecting to make a softmax classifier which gives me a probability distribution. Can someone please help me with this?!

Are you also using nn.CrossEntropyLoss() as in the tutorial? It uses softmax (log softmax to be more precise) inside the loss, so there is no need for a softmax at the end of your model.

Therefore, the output of your model is a nn.Linear layer, which can produce unbounded negative and positive numbers.

When computing accuracy later on, you can use argmax (torch.max() returns indices as well as values) or torch.topk() to get the prediction(s).

Yes! I’m using nn.CrossEntropyLoss as in the tutorial. I see, since it’s a log softmax, the output is not strictly a probability distribution, right? That’s why I’m getting values greater than 1?

I mean, in the tutorial, in this block of code

with torch.set_grad_enabled(phase == 'train'):
                    outputs = model(inputs)
                    _, preds = torch.max(outputs, 1)
                    loss = criterion(outputs, labels)

The print returns an array with values greater than 1… That’s why I’m confused.

I can relate, I was also confused at first :slight_smile:

The output of the model will indeed not be a probability distribution, it’s simply a linear function of the layer before (which is also linear in this case).

The softmax will transform this linear output into a probability distribution: more reading below.

1 Like