# Pos_weight in BCEWithLogitsLoss not behaving as expected when using 'mean' reduction

The documentation for nn.BCEWithLogitsLoss states that

For example, if a dataset contains 100 positive and 300 negative examples of a single class, then pos_weight for the class should be equal to 300/100=3. The loss would act as if the dataset contains 3×100=300 positive examples.

This is not the case when the default ‘mean’ reduction is used. I found that the positive class weight is applied correctly, but the total loss is only divided by the unadjusted number of samples.

Example:

• Imbalanced Dataset with 10% positives and 90% negatives.
• Batch size 10
• Loss for positive sample is 2.0, loss for negative samples is 0.5

`With a batch size of 10, containing 1 positive and 9 negative examples, the mean loss will amount to (1×2.0 + 9×0.5) / (1 + 9) = 0.65`
`With a batch size of 10, containing 5 positive and 5 negative examples, the mean loss will amount to (5×2.0 + 5×0.5) / (5 + 5) = 1.25`

From the description in the documentation, I would expect the loss to be 1.25 for any distribution of positives and negatives in a batch when pos_weight has been set correctly to num_neg/num_pos = 9. However, the denominator is not being adjusted when the pos_weight is being set, resulting in the following behaviour:
`With a batch size of 10, containing 1 positive and 9 negative samples, the mean loss will amount to (1×9×2.0 + 9×0.5) / (1 + 9) = 2.25`, which is even higher than the loss would have been if the batch contained only positive samples (=2.0)

To get the loss to behave as expected, I had to use ‘sum’ reduction instead of ‘mean’ and manually divide the loss sum by the adjusted sample number (pos_weight × num_pos_samples + num_neg_samples):
`With a batch size of 10, containing 1 positive and 9 negative samples, the mean loss will amount to (1×9×2.0 + 9×0.5) / (1×9 + 9) = 1.25`

Is this intended behavior? If it is, should the documentation be updated for clarification?

Hi Sofia (and Florian)!

Florian is correct that the documentation for `BCEWithLogitsLoss` that
he quotes is not right for the reason he gives.

Perhaps this should be considered a minor documentation bug.

Please note, however, in general `BCEWithLogitsLoss` does not have
“positive” and “negative” samples. This is because `BCEWithLogitsLoss`
accepts probabilistic targets that can be any value between 0.0 and 1.0,
rather that exactly 0 (negative sample) or exactly 1 (positive sample)
class labels.

`pos_weight` therefore doesn’t weight the entire sample, but, rather,
weights the “positive” term in the binary-cross-entropy expression for
a given sample. So, in the general case of probabilistic targets, it is
not possible to reproduce the effect of `pos_weight` by including
multiple copies of certain samples in your input data.

Best.

K. Frank

1 Like