Shouldn't F.log_softmax (x, dim = 0) be used?

shirui-japina · March 14, 2020, 6:16pm

Here I have three questions.

By chance I saw the code here.
What puzzles me is the class Net(nn.Module) and the loss function:

    def __init__(self):
        super(Net, self).__init__()
        self.fc1 = nn.Linear(13, 128)
        self.fc2 = nn.Linear(128, 128)
        self.fc3 = nn.Linear(128, 128)
        self.fc4 = nn.Linear(128, 128)
        self.fc5 = nn.Linear(128, 128)
        self.fc6 = nn.Linear(128, 2)

    def forward(self, x):
        x = F.relu(self.fc1(x)) # ReLU: max(x, 0)
        x = F.relu(self.fc2(x))
        x = F.relu(self.fc3(x))
        x = F.relu(self.fc4(x))
        x = F.relu(self.fc5(x))
        x = self.fc6(x)
        return F.log_softmax(x, dim=0)

criterion = nn.CrossEntropyLoss()
output = model(train_x)
loss = criterion(output, train_y)
loss.backward()
optimizer.step()

My first question is, F.log_softmax (x, dim = 0) shouldn’t be used here, is my understanding right?

As discussed in Using nn.CrossEntropyLoss(), how can I get softmax output?,

nn.CrossEntropyLoss() automatically apply logSoftmax using FC layer output.

The model class should be:

    def __init__(self):
        super(Net, self).__init__()
        self.fc1 = nn.Linear(13, 128)
        self.fc2 = nn.Linear(128, 128)
        self.fc3 = nn.Linear(128, 128)
        self.fc4 = nn.Linear(128, 128)
        self.fc5 = nn.Linear(128, 128)
        self.fc6 = nn.Linear(128, 2)

    def forward(self, x):
        x = F.relu(self.fc1(x)) # ReLU: max(x, 0)
        x = F.relu(self.fc2(x))
        x = F.relu(self.fc3(x))
        x = F.relu(self.fc4(x))
        x = F.relu(self.fc5(x))
        x = self.fc6(x)
        return x

My second question is, F.log_softmax() is used for what?

My third question is, whether F.log_softmax() is used or not, the model performance is about 90%. Why is this happening?
Finally, I should use F.log_softmax() or I should NOT use F.log_softmax()?

KFrank · March 14, 2020, 7:20pm

Hi Shirui!

Well, yes, as you have recognized, this code is wrong.

That is correct. When using CrossEntropyLoss you should not use
log_softmax() for the output of your model. (You would, if you were
using NLLLoss.) You would typically pass the output of the last linear
layer of your model into CrossEntropyLoss (as you indicated in the
code you posted).

One more error in the code you linked to:

Even if it were appropriate to use log_softmax() (for example, with
NLLLoss), log_softmax (x, dim = 0) is wrong. x here has shape
(nBatch, nClass), so log_softmax (x, dim = 0) would perform
the softmax operation across the batch dimension. You would need
instead log_softmax (x, dim = 1) in order to perform softmax
across the class dimension.

(Unfortunately, there is a lot of misinformation on the internet.)

Good luck.

K. Frank

shirui-japina · March 15, 2020, 12:47am

Got it, thanks for your help.