A question regarding CNNs in Pytorch

tejax · October 22, 2018, 10:51pm

I’ve been scrolling down through PyTorch Cifar-10 convolutional neural network tutorial and I find it strange how the Softmax activation function wasn’t used in the output layer of the forward(self,x) function:

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(3, 6, 5)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.fc1 = nn.Linear(16 * 5 * 5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = x.view(-1, 16 * 5 * 5)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

Can anyone please specify why is that the softmax activation function wasn’t used here unlike Sigmoid which is pretty much used in every NN dealing with binary classification ?

Additionally I would like to know why is _ underscore used in the beginning of the following line ?

_, predicted = torch.max(outputs, 1)

thanks in advance.

ptrblck · October 22, 2018, 11:08pm

In a classification use case you can pass the raw logits (i.e. the model output without a non-linearity) to nn.CrossEntropyLoss. Internally F.log_softmax and nn.NLLLoss is called.
Alternatively you could add nn.LogSoftmax to out output layer and use nn.NLLLoss as the criterion.

The underscore is used to throw away the returned value. torch.max would return the max value and the corresponding index. As we only need the index, the value is discarded. Alternatively you could also use torch.argmax.

tejax · October 22, 2018, 11:11pm

Thank you very much @ptrblck for the clear and quick response.