# Shouldn't F.log_softmax (x, dim = 0) be used?

Here I have three questions.

By chance I saw the code here.
What puzzles me is the `class Net(nn.Module)` and the loss function:

``````    def __init__(self):
super(Net, self).__init__()
self.fc1 = nn.Linear(13, 128)
self.fc2 = nn.Linear(128, 128)
self.fc3 = nn.Linear(128, 128)
self.fc4 = nn.Linear(128, 128)
self.fc5 = nn.Linear(128, 128)
self.fc6 = nn.Linear(128, 2)

def forward(self, x):
x = F.relu(self.fc1(x)) # ReLU: max(x, 0)
x = F.relu(self.fc2(x))
x = F.relu(self.fc3(x))
x = F.relu(self.fc4(x))
x = F.relu(self.fc5(x))
x = self.fc6(x)
return F.log_softmax(x, dim=0)
``````
``````criterion = nn.CrossEntropyLoss()
output = model(train_x)
loss = criterion(output, train_y)
loss.backward()
optimizer.step()
``````

My first question is, `F.log_softmax (x, dim = 0)` shouldn’t be used here, is my understanding right?

As discussed in Using nn.CrossEntropyLoss(), how can I get softmax output?,

nn.CrossEntropyLoss() automatically apply logSoftmax using FC layer output.

The model class should be:

``````    def __init__(self):
super(Net, self).__init__()
self.fc1 = nn.Linear(13, 128)
self.fc2 = nn.Linear(128, 128)
self.fc3 = nn.Linear(128, 128)
self.fc4 = nn.Linear(128, 128)
self.fc5 = nn.Linear(128, 128)
self.fc6 = nn.Linear(128, 2)

def forward(self, x):
x = F.relu(self.fc1(x)) # ReLU: max(x, 0)
x = F.relu(self.fc2(x))
x = F.relu(self.fc3(x))
x = F.relu(self.fc4(x))
x = F.relu(self.fc5(x))
x = self.fc6(x)
return x
``````

My second question is, F.log_softmax() is used for what?

My third question is, whether `F.log_softmax()` is used or not, the model performance is about 90%. Why is this happening?
Finally, I should use `F.log_softmax()` or I should NOT use `F.log_softmax()`?

Hi Shirui!

Well, yes, as you have recognized, this code is wrong.

That is correct. When using `CrossEntropyLoss` you should not use
`log_softmax()` for the output of your model. (You would, if you were
using `NLLLoss`.) You would typically pass the output of the last linear
layer of your model into `CrossEntropyLoss` (as you indicated in the
code you posted).

One more error in the code you linked to:

Even if it were appropriate to use `log_softmax()` (for example, with
`NLLLoss`), `log_softmax (x, dim = 0)` is wrong. `x` here has shape
`(nBatch, nClass)`, so `log_softmax (x, dim = 0)` would perform
the softmax operation across the batch dimension. You would need
instead `log_softmax (x, dim = 1)` in order to perform softmax
across the class dimension.

(Unfortunately, there is a lot of misinformation on the internet.)

Good luck.

K. Frank

1 Like

Got it, thanks for your help. 