Why does CrossEntropyLoss include the softmax function?



Wouldn’t it be better if nn.CrossEntropyLoss specified on its name that it is also performing a softmax to the input? (as they do in TF) I hadn’t read the description of this loss function an I was using it wrong, since I was applying a softmax to the output of my network right before CrossEntropyLoss.

Also, my network looks like a typical FFNN (see below). What is the recommended way in pytorch for handling this different network structure for training and inference? (inference should include softmax, training shouldn’t).


class Net(nn.Module):
    def __init__(self, num_inputs, num_u_hl1, num_u_hl2, num_outputs, dropout_rate):
        super(Net, self).__init__()
        self.cl0   = nn.Linear(num_inputs, num_u_hl1)
        self.cl1   = nn.Linear(num_u_hl1, num_u_hl2)
        self.cl2   = nn.Linear(num_u_hl2, num_outputs)
        self.d1 = nn.Dropout(dropout_rate)
        self.d2 = nn.Dropout(dropout_rate)

    def forward(self, x):
        x = self.cl0(x)
        x = self.d1(x)
        x = F.sigmoid(self.cl1(x))
        x = self.d2(x)
        x = F.softmax(self.cl2(x))
        return x

(Albert Zhuang) #2

That is the intention for CrossEntropyLoss – Apply softmax in training but not in inference (assuming you don’t need probabilistic representation).

And you can use training property to handle network for training and inference.

if self.training:
    # code for training
    # code for inference


Ok, thanks for the info!

I still think that the CrossEntropyLoss should have a name that specifies that the softmax is included in that function.

(colesbury) #4

Yes, I think the TensorFlow name is more clear. The name “CrossEntropyLoss” was inherited from Lua Torch.