Which combination of criterion and activation function to get 1 probability prediction


Currently I have a model to predict a binary target using NLLLoss() with the final layer going through a log_softmax() which gives me probability for target being 0 and probability for target being 1. I would like to output 1 final probability in order to use the Captum package to calculate feature importance. What is the correct combination of loss function and final activation function to give me one probability prediction instead of two?

Hi, I think nn.Sigmoid() as the last activation layer, and nn.BCELosswithLogits() would help here.

thank you. The nn.sigmoid() worked but the nn.BCELosswithLogits() gave two outputs. I’m testing sigmoid with just BCELoss now.

Hi, actually I was wrong, you can either use nn Sigmoid() and BCELoss() , or you can use one standalone function nn.BCELosswithLogits() which combines both the Sigmoid as well as the Log-likelihood formula.
Apparently, nn.BCELoss() and nn.BCEwithLogitsLoss() should yield same number of outputs.