BCEwithlogitsloss and loss functions in general

Hi there
I’m working on my first deep learning project. When you define BCEwithlogitloss, you have the option to include pos_weights which defines how much importance should be placed on positive labels in a multilabel classification with lots of negative ones.

I’m using mini batch and I want to calculate pos_weight by following the suggestion of pos_weight = total_neg/total_pos.

Does that value need to be across all in the dataset or within each mini batch. Does it matter if i keep redefining the loss function for each mini batch iteration, Is any information lost by doing that?


I am pretty sure it is across all the dataset.
Which is also what the documentation says here:

For example, if a dataset contains 100 positive and 300 negative examples of a single class, then pos_weight for the class should be equal to 300/100=3. The loss would act as if the dataset contains 3*100=300 positive examples.

Calculating it for every mini-batch would also be problematic since theoretically you could have a mini-batch with 0 positive samples in which case you would be dividing by 0.

As for this question I am not sure.
Technically it is possible to change the loss function during each iteration and there might be some very specific cases where this is done.
But for your application using BCEwithlogitloss I think the intended usage would be defining the loss function once with pos_weight being like you said here

where total would mean total across the whole dataset.

Just one more thing to keep in mind.
If you use pos_weight you need to also be carefully with using any random sampler or random sub-sampler where you would be randomly reducing the number of samples per epoch, since than the calculated pos_weight would no longer be true.

thank you very much. Makes total sense with 0 pos cases example