Hi everyone,

I have gotten confused in understanding the “pos_weight” and “weight” parameters in BCEWithLogitsLoss. I’ve read all the relevant discussions in this regard, to my knowledge; however, still, I’ve not understood them completely.

Imagine that I have a multi-class, multi-label classification problem; my imbalanced one-hot coded dataset includes 1000 images with 4 labels with the following frequencies: class 0: 600, class 1: 550, class 2: 200, class 3: 100.

As I said, the targets are in a one-hot coded structure.

For instance, the target [0, 1, 1, 0] means that classes 1 and 2 are present in the corresponding image.

Well, in order to calculate the BCEWithLogitsLoss concerning the data imbalance, one way is that suggested in this post: Multi-Label, Multi-Class class imbalance - #2 by ptrblck. Here, we calculate the class weights by inverting the frequencies of each class, i.e., the class weight tensor in my example would be: torch.tensor ([1/600, 1/550, 1/200, 1/100]). After that, the class weight tensor will be multiplied by the unreduced loss and the final loss would be the mean of this tensor.

However, as far as I know, the pos_weight parameter of the BCEWithLogitsLoSS could also be used in this case. Here is my question:

to my knowledge, the two tensors (class weight tensor in the previous paragraph and the pos_weight tensor) are totally different. For each class, the number of positive and negative samples should be calculated and the num_negative/ num_positive would be the pos_weight for that particular class. In my example, the pos_weight of class 0 would be (1000-600)/600 = 0.67, the pos_weight of class 1 would be (1000-550)/550 = .82, the pos_weight of class 2 would be (1000-200)/200 = 4, and the pos_weight of class 3 would be (1000-100)/100 = 9. Thus, the pos_weight tensor would be torch.tensor ([.67, .82, 4, 9]). Is this way of calculating pos_weight tensor the right one? If the answer is yes, I think that the previous method (I mean calculating the class weights and multiplying it with the unreduced loss) would be more convenient in the case of a dataset with a large number of labels since we should only invert the frequencies, am I right?

Also, another question is about the weight parameter of the BCEWithLogitsLoss. As represented in the formula, the weight parameter is a tensor that is multiplied by the whole loss, not merely the positive targets (as opposed to pos_weight). My question is that how the weight parameter tensor is different from the class weights tensor, considering that the class weights tensor is similarly multiplied by the whole loss. However, it is said that the weight parameter tensor is of size nbatch, and I do not understand what its function is.

I deeply appreciate your consideration.