What is the strongest loss function?

YuhskeHujisaki · June 17, 2022, 2:27am

Sometimes the loss function is not written when implementing deep learning in a dissertation. What is the recommended loss function at this time? I would appreciate it if you could write the reason as well.

KFrank · June 17, 2022, 5:05pm

Hi Yuhske!

I always recommend starting out with the simplest standard approach
commonly recommended for the type of problem, and only suggest
moving to a more elaborate or less-standard approach if a good reason
to do so becomes clear.

For a binary classification problem, start with BCEWithLogitsLoss. This
will also apply to something like binary semantic segmentation, where you
are performing a binary classification (e.g., “background” vs. “foreground”)
for each pixel. If you have unbalanced data, try using the pos_weight
constructor argument before moving to a different loss function.

For a multi-class classification problem, start with CrossEntropyLoss.
Its weight constructor argument can be used to compensate for unbalanced
data.

For a multi-label, multi-class classification problem, go back to using
BCEWithLogitsLoss. This is because the multi-label problem should
be understood as a set of binary problems, as each label ca be active
independently for a given sample. (E.g., does the first label apply to the
sample – yes or no?; does the second label apply – yes or no?; and so
on.)

For a “regression” problem, start with MSELoss. By regression, I mean
that your model predicts a floating-point number that should match, as
closely as possible, a floating-point target – and that a close match is
quantitatively better than a less-close match. Let’s say that you are trying
to predict the return on a stock price from today to tomorrow: If tomorrow’s
price turns out to be 1.1% higher than today’s, and you predict 1.0%, you’ve
done quite well (so don’t penalize your prediction very much). If you predict
0.9%, you’ve still done quite well. But a prediction of -1.0% or 3.5% would
be significantly worse, and should be penalized more heavily. That’s exactly
how the squared errors in MSELoss work.

(As an aside don’t use MSELoss for typical classification problems. Let’s
say you have “0 – cat,” “1 – trout,” and “2 – sparrow.” Misclassifying a
sparrow as a trout is just as bad as misclassifying it as a cat. But MSELoss
would say that trout would be less of an error than cat, so it’s not the right
fit for this kind of problem, and you won’t train as effectively.)

Best.

K. Frank