Wouldn’t it be better if nn.CrossEntropyLoss specified on its name that it is also performing a softmax to the input? (as they do in TF) I hadn’t read the description of this loss function an I was using it wrong, since I was applying a softmax to the output of my network right before CrossEntropyLoss.
Also, my network looks like a typical FFNN (see below). What is the recommended way in pytorch for handling this different network structure for training and inference? (inference should include softmax, training shouldn’t).
class Net(nn.Module): def __init__(self, num_inputs, num_u_hl1, num_u_hl2, num_outputs, dropout_rate): super(Net, self).__init__() self.cl0 = nn.Linear(num_inputs, num_u_hl1) self.cl1 = nn.Linear(num_u_hl1, num_u_hl2) self.cl2 = nn.Linear(num_u_hl2, num_outputs) self.d1 = nn.Dropout(dropout_rate) self.d2 = nn.Dropout(dropout_rate) def forward(self, x): x = self.cl0(x) x = self.d1(x) x = F.sigmoid(self.cl1(x)) x = self.d2(x) x = F.softmax(self.cl2(x)) return x