BCEWithLogitLoss for multiclass, class imbalance

Jeguarko · May 17, 2022, 12:24pm

Help to understand how BCEWithLogitLoss works for a multiclass case with class imbalance (object detection, Yolov5 (yolov5/loss.py at master · ultralytics/yolov5 · GitHub)).

I have a dataset of 6595 images. Each image can have up to 5 different object classes. Suppose these are people, cars, billboards, trees, bicycles. Number of objects by class on the entire dataset:
people: 614,
cars: 947,
billboards: 1628,
trees: 2743,
bicycles: 663

I need to use the pos_weight argument and pass the weights tensor to it [9.74, 5.96, 3.05, 1.4, 8.94], using this recommendation?
“For example, if a dataset contains 100 positive and 300 negative examples of a single class, then pos_weight for the class should be equal to 300/100=3. The loss would act as if the dataset contains 3×100=300 positive examples.”
Anyway, what happens when filling pos_weight for Multiclass?
what is the “weight” argument for and what happens when it is not empty?

KFrank · May 17, 2022, 4:42pm

Hi Egor!

This suggests to me that you are performing a single-label, multi-class
classification.

That is, each image is either in class 0, or in class 1, or in class 2, etc.,
but not in more than one class (and not in no class).

For single-label, multi-class problems, you would typically use
CrossEntropyLoss (not BCEWithLogitsLoss).

For CrossEntropyLoss, you would use its weight constructor argument.
(It plays a similar role BCEWithLogitLoss’s pos_weight argument.)

One would typically use the reciprocal of a class’s frequency for its
weight, thus:

weight = torch.tensor ([6595 / 614, 6595 / 947, ...])

Note, the exact values used for weight don’t matter – they’re just rough
fudge factors to account for the class imbalance.

Also, in your case, your biggest class imbalance is about 4.5-to-1, which
isn’t too bad, so you probably don’t need to use weight, although doing
so won’t hurt, and could help at the margins.

Best.

K. Frank

Jeguarko · May 17, 2022, 5:09pm

Thank you for the answer and sorry for the incorrect description of the problem. I have 5 classes of different objects, each of which can be in the image.

KFrank · May 17, 2022, 5:41pm

Hi Egor!

Yes, then this is a multi-label, multi-class problem for which
BCEWithLogitsLoss would typically be the best choice.

Your proposed values for pos_weight look appropriate, given the
class frequencies you posted.

When pos_weight is not None, it gets applied along the class dimension
of your input (predictions) and target (labels) as shown in the formula
in the BCEWithLogitsLoss documentation. That is, the element of
pos_weight that corresponds to the class in question multiplies the
“predicted-positive” term in the per-sample, per-sample term cross-entropy
expression.

The weight argument, if present, simply multiplies the loss contribution
of each sample in the batch.

(More precisely, input, target, and weight are all mutually broadcast
with respect to one another, and then combined together element-wise.
This means that, for example, in a semantic-segmentation use case, you
could use the weight argument to give different weights to each pixel in
your image, not just to different samples in your batch.)

Best.

K. Frank

Jeguarko · May 17, 2022, 8:16pm

Thank you for the detailed answer!