I have a dataset of 6595 images. Each image can have up to 5 different object classes. Suppose these are people, cars, billboards, trees, bicycles. Number of objects by class on the entire dataset:
people: 614,
cars: 947,
billboards: 1628,
trees: 2743,
bicycles: 663
I need to use the pos_weight argument and pass the weights tensor to it [9.74, 5.96, 3.05, 1.4, 8.94], using this recommendation?
“For example, if a dataset contains 100 positive and 300 negative examples of a single class, then pos_weight for the class should be equal to 300/100=3. The loss would act as if the dataset contains 3×100=300 positive examples.”
Anyway, what happens when filling pos_weight for Multiclass?
what is the “weight” argument for and what happens when it is not empty?
Note, the exact values used for weight don’t matter – they’re just rough
fudge factors to account for the class imbalance.
Also, in your case, your biggest class imbalance is about 4.5-to-1, which
isn’t too bad, so you probably don’t need to use weight, although doing
so won’t hurt, and could help at the margins.
Thank you for the answer and sorry for the incorrect description of the problem. I have 5 classes of different objects, each of which can be in the image.
Yes, then this is a multi-label, multi-class problem for which BCEWithLogitsLoss would typically be the best choice.
Your proposed values for pos_weight look appropriate, given the
class frequencies you posted.
When pos_weight is not None, it gets applied along the class dimension
of your input (predictions) and target (labels) as shown in the formula
in the BCEWithLogitsLoss documentation. That is, the element of pos_weight that corresponds to the class in question multiplies the
“predicted-positive” term in the per-sample, per-sample term cross-entropy
expression.
The weight argument, if present, simply multiplies the loss contribution
of each sample in the batch.
(More precisely, input, target, and weight are all mutually broadcast
with respect to one another, and then combined together element-wise.
This means that, for example, in a semantic-segmentation use case, you
could use the weight argument to give different weights to each pixel in
your image, not just to different samples in your batch.)