So I have simple test network for MNIST data as following
def forward(self, x):
x = F.relu(F.max_pool2d(self.conv1(x), 2))
x = F.relu(F.max_pool2d(self.conv2_drop(self.conv2(x)), 2))
x = x.view(-1, 80)
x = F.relu(self.fc1(x))
x = F.dropout(x, training=self.training)
x = self.fc2(x)
return F.XXXX(x)
I am using NLLLoss function to implement cross entropy loss explicitly
I wanted to understand how different act function affects the accuracy,
so I tried LogSoftmax and verify that the network trains but for some reason when I used LogSigmoid, the network fails to train.
(note that NLLLoss expects log probability)
Since softmax and sigmoid both have output value between 0 and 1.
I thought there shouldn’t be an issue.
Can anyone explain the detail I am not catching here?
To me, the logsigmoid+NLLLoss combination hardly makes any sense, because the objective function only tries to promote the gt_class, but no suppression on the negative ones. Maybe you wanna try sigmoid+bceloss.
Thank you so much for your advice.
the model trains with bce even with sigmoid
softmax works as well with bce but only up to a certain point and the training collapse.
I am not sure why but I guess it has something to do with the dependency among classes…
Do you know if there is a loss function which will be good for both activation function.
a naive answer: If you really want to test with a single loss function for both activation functions, what about L2 loss with one-hot vectors as target?
I’m not sure if it will give good performance though.
Softmax is actually not an activation function…
logsigmoid+nllloss doesn’t make sense mathematically (if you derive the gradients, you’ll find it.)