I am building a binary classification where the class I want to predict is present only <2% of times.
The last layer could be logosftmax or softmax.
self.softmax = nn.Softmax(dim=1) or self.softmax = nn.LogSoftmax(dim=1)
my questions
I should use softmax as it will provide outputs that sum up to 1 and I can check performance for various prob thresholds. is that understanding correct?
if I use softmax then can I use cross_entropy loss? This seems to suggest that it is okay to use
if i use logsoftmax then can I use cross_entropy loss? This seems to suggest that I shouldnt.
if I use softmax then is there any better option than cross_entropy loss?
Build a model that outputs a single value (per sample in a batch),
typically by using a Linear with out_features = 1 as the final
layer.
This value will be a raw-score logit. Use BCEWithLogitsLoss as your
loss criterion (and do not use a final “activation” such as sigmoid() or softmax() or log_softmax()).
Either sample your underrepresented class more heavily when training,
e.g., about fifty times more heavily, or weight the underrepresented class
in your loss computation by using BCEWithLogitsLoss’s pos_weight
constructor argument with something like:
could you answer my 4 questions? just yes or no would suffice…
I will also look into your reply and try
Few additional questions:
I understand your suggestion " and do not use a final “activation” such as sigmoid() or softmax() or log_softmax() )." But what should be my final activation? i looked at linear and it doesnt do anything. it is just a pass through. Could you point the exact function?