I am building a binary classification where the class I want to predict is present only <2% of times.
The last layer could be
self.softmax = nn.Softmax(dim=1) or
self.softmax = nn.LogSoftmax(dim=1)
I should use
softmax as it will provide outputs that sum up to 1 and I can check performance for various prob thresholds. is that understanding correct?
if I use
softmax then can I use
cross_entropy loss? This seems to suggest that it is okay to use
if i use
logsoftmax then can I use
cross_entropy loss? This seems to suggest that I shouldnt.
if I use
softmax then is there any better option than
` cross_entropy = nn.CrossEntropyLoss(weight=class_wts)`
Build a model that outputs a single value (per sample in a batch),
typically by using a
out_features = 1 as the final
This value will be a raw-score logit. Use
BCEWithLogitsLoss as your
loss criterion (and do not use a final “activation” such as
Either sample your underrepresented class more heavily when training,
e.g., about fifty times more heavily, or weight the underrepresented class
in your loss computation by using
constructor argument with something like:
criterion = torch.nn.BCEWithLogitsLoss (pos_weight = torch.tensor ([50.0]))
could you answer my 4 questions? just yes or no would suffice…
I will also look into your reply and try
Few additional questions:
I understand your suggestion " and do not use a final “activation” such as
log_softmax() )." But what should be my final activation? i looked at linear and it doesnt do anything. it is just a pass through. Could you point the exact function?