eugene
(eugene)
July 24, 2018, 11:01am
1
In a multi-class classification, I sometimes see the following two implementations:
nn.Linear
+ nn.CrossEntropyLoss
nn.LogSoftmax
+ nn.NLLLoss
Are they both the same in terms of the following?
Both are softmax classifiers
Mathematically
Model training efficiency
Any other differences?
What are the trade-offs to consider?
If the intention is to do binary classification, what’s the most efficiency way to output a probability?
1 Like
Both approaches are the same.
In fact nn.CrossEntropyLoss
just uses nn.LogSoftmax()
+ nn.NLLLoss()
internally.
Here is the line of code.
For a binary classification you could use nn.CrossEntropyLoss()
with a logit output of shape [batch_size, 2]
or nn.BCELoss()
with a nn.Sigmoid()
in the last layer.
4 Likes
BCEWithLogitsLoss = One Sigmoid Layer + BCELoss (solved numerically unstable problem)
eugene
(eugene)
July 24, 2018, 1:38pm
4
Thanks for the quick response! With this, I guess I’ll need to think through it a bit harder to convince myself of it.
NOP
(NOP)
June 26, 2019, 5:45pm
5
When you said binary classification you mean here just two categories, (like spam or not spam) Right?
Yes, by binary classification I meant a use case with two target classes (positive vs. negative).
1 Like
Neta_Zmora
(Neta Zmora)
December 10, 2019, 3:24pm
7
Here’s the permalink to the line of code @ptrblck was pointing to in his answer (which pointed to a line in master
- a moving target )
ptrblck
December 10, 2019, 3:34pm
8
Oops, thanks for the permalink