Pos_weight in BCEWithLogitsLoss not behaving as expected when using 'mean' reduction

The documentation for nn.BCEWithLogitsLoss states that

For example, if a dataset contains 100 positive and 300 negative examples of a single class, then pos_weight for the class should be equal to 300/100=3. The loss would act as if the dataset contains 3×100=300 positive examples.

This is not the case when the default ‘mean’ reduction is used. I found that the positive class weight is applied correctly, but the total loss is only divided by the unadjusted number of samples.

Example:

  • Imbalanced Dataset with 10% positives and 90% negatives.
  • Batch size 10
  • Loss for positive sample is 2.0, loss for negative samples is 0.5

With a batch size of 10, containing 1 positive and 9 negative examples, the mean loss will amount to (1×2.0 + 9×0.5) / (1 + 9) = 0.65
With a batch size of 10, containing 5 positive and 5 negative examples, the mean loss will amount to (5×2.0 + 5×0.5) / (5 + 5) = 1.25

From the description in the documentation, I would expect the loss to be 1.25 for any distribution of positives and negatives in a batch when pos_weight has been set correctly to num_neg/num_pos = 9. However, the denominator is not being adjusted when the pos_weight is being set, resulting in the following behaviour:
With a batch size of 10, containing 1 positive and 9 negative samples, the mean loss will amount to (1×9×2.0 + 9×0.5) / (1 + 9) = 2.25, which is even higher than the loss would have been if the batch contained only positive samples (=2.0)

To get the loss to behave as expected, I had to use ‘sum’ reduction instead of ‘mean’ and manually divide the loss sum by the adjusted sample number (pos_weight × num_pos_samples + num_neg_samples):
With a batch size of 10, containing 1 positive and 9 negative samples, the mean loss will amount to (1×9×2.0 + 9×0.5) / (1×9 + 9) = 1.25

Is this intended behavior? If it is, should the documentation be updated for clarification?

Any updates on this?

Hi Sofia (and Florian)!

Florian is correct that the documentation for BCEWithLogitsLoss that
he quotes is not right for the reason he gives.

Perhaps this should be considered a minor documentation bug.

Please note, however, in general BCEWithLogitsLoss does not have
“positive” and “negative” samples. This is because BCEWithLogitsLoss
accepts probabilistic targets that can be any value between 0.0 and 1.0,
rather that exactly 0 (negative sample) or exactly 1 (positive sample)
class labels.

pos_weight therefore doesn’t weight the entire sample, but, rather,
weights the “positive” term in the binary-cross-entropy expression for
a given sample. So, in the general case of probabilistic targets, it is
not possible to reproduce the effect of pos_weight by including
multiple copies of certain samples in your input data.

Best.

K. Frank

1 Like