eugene
(eugene)
July 24, 2018, 11:01am
1
In a multi-class classification, I sometimes see the following two implementations:
nn.Linear + nn.CrossEntropyLoss
nn.LogSoftmax + nn.NLLLoss
Are they both the same in terms of the following?
Both are softmax classifiers
Mathematically
Model training efficiency
Any other differences?
What are the trade-offs to consider?
If the intention is to do binary classification, what’s the most efficiency way to output a probability?
1 Like
Both approaches are the same.
In fact nn.CrossEntropyLoss just uses nn.LogSoftmax() + nn.NLLLoss() internally.
Here is the line of code.
For a binary classification you could use nn.CrossEntropyLoss() with a logit output of shape [batch_size, 2] or nn.BCELoss() with a nn.Sigmoid() in the last layer.
4 Likes
BCEWithLogitsLoss = One Sigmoid Layer + BCELoss (solved numerically unstable problem)
eugene
(eugene)
July 24, 2018, 1:38pm
4
Thanks for the quick response! With this, I guess I’ll need to think through it a bit harder to convince myself of it.
NOP
(NOP)
June 26, 2019, 5:45pm
5
When you said binary classification you mean here just two categories, (like spam or not spam) Right?
Yes, by binary classification I meant a use case with two target classes (positive vs. negative).
1 Like
Neta_Zmora
(Neta Zmora)
December 10, 2019, 3:24pm
7
Here’s the permalink to the line of code @ptrblck was pointing to in his answer (which pointed to a line in master - a moving target )
ptrblck
December 10, 2019, 3:34pm
8
Oops, thanks for the permalink