The documentation for nn.BCEWithLogitsLoss states that

For example, if a dataset contains 100 positive and 300 negative examples of a single class, then pos_weight for the class should be equal to 300/100=3.

The loss would act as if the dataset contains 3×100=300 positive examples.

This is not the case when the default ‘mean’ reduction is used. I found that the positive class weight is applied correctly, but the total loss is only divided by the unadjusted number of samples.

Example:

- Imbalanced Dataset with 10% positives and 90% negatives.
- Batch size 10
- Loss for positive sample is 2.0, loss for negative samples is 0.5

`With a batch size of 10, containing 1 positive and 9 negative examples, the mean loss will amount to (1×2.0 + 9×0.5) / (1 + 9) = 0.65`

`With a batch size of 10, containing 5 positive and 5 negative examples, the mean loss will amount to (5×2.0 + 5×0.5) / (5 + 5) = 1.25`

From the description in the documentation, I would expect the loss to be 1.25 for any distribution of positives and negatives in a batch when pos_weight has been set correctly to num_neg/num_pos = 9. However, the denominator is not being adjusted when the pos_weight is being set, resulting in the following behaviour:

`With a batch size of 10, containing 1 positive and 9 negative samples, the mean loss will amount to (1×9×2.0 + 9×0.5) / (1 + 9) = 2.25`

, which is even higher than the loss would have been if the batch contained only positive samples (=2.0)

To get the loss to behave as expected, I had to use ‘sum’ reduction instead of ‘mean’ and manually divide the loss sum by the adjusted sample number (pos_weight × num_pos_samples + num_neg_samples):

`With a batch size of 10, containing 1 positive and 9 negative samples, the mean loss will amount to (1×9×2.0 + 9×0.5) / (1×9 + 9) = 1.25`

Is this intended behavior? If it is, should the documentation be updated for clarification?