Why does CrossEntropyLoss include the softmax function?

Hi,

Wouldn’t it be better if nn.CrossEntropyLoss specified on its name that it is also performing a softmax to the input? (as they do in TF) I hadn’t read the description of this loss function an I was using it wrong, since I was applying a softmax to the output of my network right before CrossEntropyLoss.

Also, my network looks like a typical FFNN (see below). What is the recommended way in pytorch for handling this different network structure for training and inference? (inference should include softmax, training shouldn’t).

Thanks!

``````class Net(nn.Module):
def __init__(self, num_inputs, num_u_hl1, num_u_hl2, num_outputs, dropout_rate):
super(Net, self).__init__()
self.cl0   = nn.Linear(num_inputs, num_u_hl1)
self.cl1   = nn.Linear(num_u_hl1, num_u_hl2)
self.cl2   = nn.Linear(num_u_hl2, num_outputs)
self.d1 = nn.Dropout(dropout_rate)
self.d2 = nn.Dropout(dropout_rate)

def forward(self, x):
x = self.cl0(x)
x = self.d1(x)
x = F.sigmoid(self.cl1(x))
x = self.d2(x)
x = F.softmax(self.cl2(x))
return x``````
5 Likes

That is the intention for CrossEntropyLoss – Apply softmax in training but not in inference (assuming you don’t need probabilistic representation).

And you can use `training` property to handle network for training and inference.

``````if self.training:
# code for training
else:
# code for inference
``````
3 Likes

Ok, thanks for the info!

I still think that the CrossEntropyLoss should have a name that specifies that the softmax is included in that function.

Yes, I think the TensorFlow name is more clear. The name “CrossEntropyLoss” was inherited from Lua Torch.

3 Likes