I am trying to implement a model for binary classification problem. Up to now, I was using softmax function (at the output layer) together with torch.NLLLoss function to calculate the loss. However, now I want to use the sigmoid function (instead of softmax) at the output layer. If I do that, should I also change the loss function or may I still use torch.NLLLoss function?
Sorry for the confusion. No, you should just use a sigmoid on your output, if you are using nn.BCELoss.
Also, I’m not sure @kenmikanmi’s approach will work, as the second term seems to have a small mistake.
The second term should look like: (1 - t) * log(1 - sigmoid(x)), while currently the formula uses (1 - t) * (1 - logsigmoid(x)).