I’m segmenting foreground vs background using unet and there are many more 0s than 1s due to this. It looks like i should be using BCEWithLogitsLoss as my loss function?

It looks like this function takes as an argument the proportion of class imbalance - “pos_weight (Tensor, optional) – a weight of positive examples. Must be a vector with length equal to the number of classes.”

Does that mean i need to know across all my images how many pixels are 1 v 0 If so is there any best practice on how to do this?

I do have some images that are pure “background” without any 1s in as part of the dataset, so an alternative is i could use pos_weight as the number of positive vs negative at a global scale?

There is nothing magic about the exact choice of pos_weight. It
provides a rough reweighting of positive samples vs. negative.
Nothing goes wrong if you choose one reasonable value instead
of another.

I don’t think that there is any benefit in trying to choose pos_weight
on a per-batch basis – just get an approximate estimate of the ratio
of “1” pixels to “0” pixels in your entire training set. There would be
nothing wrong with counting the exact numbers of "1"s and "0"s in
your training set, but if you want to save time, you could just count
the pixels in a representative sample of your images. You could
even count a representative sample of pixels within an image (in
your representative sample of images). You just need to count
enough (representative) pixels that you get a reasonable statistical
estimate of the number of "1"s vs. "0"s.

The reasonable and simple choice for pos_weight is then number_of_0s / number_of_1s. (The idea is that if your training
set consists of almost all "0"s, then your classifier will do a very
good job by always predicting “0” – clearly not what you want.
The above choice of pos_weight means that in aggregate your
“0”-pixels and “1”-pixels will now contribute to your loss function
– and hence your training – with approximately equal weight.)

(A side note on a potentially confusing bit of terminology. The “number
of classes” for the length of the pos_weight vector in your use case
is 1. The one class is, let’s call it, “foreground”, and a pixel can be
either the binary choice “yes foreground” or “no foreground”. Your
network will provide one logit output per pixel. If we were
speaking in terms of a multi-class classifier (using, for example, CrossEntropyLoss) we would say you had two classes,
“yes foreground” and “no foreground.” The reason we don’t
stamp out this confusing terminology for BCEWithLogitsLoss
is that you can use BCEWithLogitsLoss for a multi-label,
multi-class classification problem where you can have n classes,
any number of which can be “yes” or “no” at the same time. This
is then a set of n binary classification problems, but we (smartly)
train a single classifier network that performs the n binary
classifications at the same time, (usefully) sharing features and
training, etc. In this case we say we have n classes, not 2^n
classes – that is, we don’t count the number of combinations of
"yes"s and "no"s as the number of classes.)

Thanks @KFrank that is a very clear and concise reply and makes things very easy to comprehend.

The second part in terms of the labels also sheds a bit of light on another problem i’m having with the shape of the input based around different loss functions so will give me a solid place to start trying to unravel that too!

@KFrank ok - so the second point you made about the number of classes really helped me clear something up i was struggling with as i had wrongly put in a [‘class1’, ‘class 2’] rather than just ‘class1’ because of the background issue you mentioned.

I tried using pytorchs BCEWithLogitsLoss out the box however i was getting an error “expecting target data type as long but BCElosswithlogistic loss function expecting as float datatype”

To get around this, i tried to define my own loss function with a long data type, however i’m back at square one in terms of calculating pos_weight (though from a much more informed perspective which is great!).

I understand you said there isnt a magical right way to do this, but is there even any best practice / automated way to get this? I’ve got a folder with X images in and if each image has N pixels - some are 1, the only way I could see to do this is to turn each image into an array and then iterate over all the images in the folder to see how many 0s and how many 1s?

The error message you quote seems somewhat garbled. However,
I think your issue is this: BCEWithLogitsLoss expects a target of
type float, rather than a long. So convert your target tensor from
long to float.

(You can think of integer targets as being class labels: 0 means “no,”
and 1 means “yes.” But you can also understand them as probabilities: 0.0 mean 0% probability of being “yes” (thus, “no”), and 1.0 means
100% probability of being “yes” (thus “yes”). Unlike something like CrossEntropyLoss that requires its targets to be integer class labels, BCEWithLogitsLossdoes accept non-integer probabilities for its
targets. So you can have 0.15 (probably “no”) or 0.90 (probably
“yes”). This is a good thing, but in order to support this use case, BCEWithLogitsLoss requires floats for its targets, even if you are
only passing it 0.0s and 1.0s.)

Well, yes, at some point you have to count the 0s and 1s, at least for a
representative sample of your data.

If I were doing it, I would recognize that my pytorch code is somewhere
already converting those images into target tensors that consist of 0s
and 1s. Then target.mean() will give you the fraction of 1s in that
target tensor.

I have applied target.mean() as you recommended, to determine the fraction of 1s in the target tensor and it gave me very small values (0.2128, 0.099, 0.0421) for different tensors (images). I have used 0.2128 as the pos_weight. But, I could not notice any changes during the training process. Please, could you provide with some clarification?

You’re doing it backwards. You have fewer 1s than 0s, but by using 0.2128 for pos_weight you are further reducing the weight of the 1s
pixels in your loss function.

Let’s say that averaged over multiple images your fraction of 1s is
about 0.10. Then, approximately, you would want to use a value
of pos_weight of 1.0 / 0.10 = 10.0 to increase the weight of
the 1s.

More precisely (but we don’t need to be precise about the value of pos_weight), you would use pos_weight = (1.0 - 0.10) / 0.10.

Thank you very much for your continuous help. I have conducted some experiments passing a value to pos_weight as follows.

pos_weight = (1.0 - 0.2128) / 0.2128 = 3.69

However, the loss instead got higher. Please, what could be the reason?
From my understanding, target.mean() returns the fraction of 1s in the label. Given that my class of interest (foreground) is 1s, why is that the fraction of 1s cannot be directly passed to pos_weight?

Please, could anyone provide me with some clarification?