Sometimes the loss function is not written when implementing deep learning in a dissertation. What is the recommended loss function at this time? I would appreciate it if you could write the reason as well.

Hi Yuhske!

I always recommend starting out with the simplest standard approach

commonly recommended for the type of problem, and only suggest

moving to a more elaborate or less-standard approach if a good reason

to do so becomes clear.

For a *binary classification* problem, start with `BCEWithLogitsLoss`

. This

will also apply to something like binary semantic segmentation, where you

are performing a binary classification (e.g., “background” vs. “foreground”)

for each pixel. If you have unbalanced data, try using the `pos_weight`

constructor argument before moving to a different loss function.

For a *multi-class classification* problem, start with `CrossEntropyLoss`

.

Its `weight`

constructor argument can be used to compensate for unbalanced

data.

For a *multi-label, multi-class classification* problem, go back to using

`BCEWithLogitsLoss`

. This is because the multi-label problem should

be understood as a set of binary problems, as each label ca be active

independently for a given sample. (E.g., does the first label apply to the

sample – yes or no?; does the second label apply – yes or no?; and so

on.)

For a “regression” problem, start with `MSELoss`

. By regression, I mean

that your model predicts a floating-point number that should match, as

closely as possible, a floating-point target – and that a close match is

quantitatively better than a less-close match. Let’s say that you are trying

to predict the return on a stock price from today to tomorrow: If tomorrow’s

price turns out to be 1.1% higher than today’s, and you predict 1.0%, you’ve

done quite well (so don’t penalize your prediction very much). If you predict

0.9%, you’ve still done quite well. But a prediction of -1.0% or 3.5% would

be significantly worse, and should be penalized more heavily. That’s exactly

how the squared errors in `MSELoss`

work.

(As an aside *don’t* use `MSELoss`

for typical classification problems. Let’s

say you have “0 – cat,” “1 – trout,” and “2 – sparrow.” Misclassifying a

sparrow as a trout is just as bad as misclassifying it as a cat. But `MSELoss`

would say that trout would be less of an error than cat, so it’s not the right

fit for this kind of problem, and you won’t train as effectively.)

Best.

K. Frank