I am slightly confused on using weighted BCEWithLogitsLoss. My input data shape is : 1 x 52 x 52 x 52 (3D Volume) and label for each volume is either 0 or 1 and i am using a batch size of 5. So, at each epoch, input is 5 x 1 x 52 x 52 x 52 and label is 1 x 5. The way i am calculating weights is:
My question is should i calculate weights for each batch or per dataset. Also, assuming my label per batch is [0, 0, 1, 0, 1] and weight_0 is 0.4 and weight_1 is 0.6, so does the weight tensor passed to nn.BCEWithLogitsLoss will be [0.4, 0.4, 0.6, 0.4, 0.6] ?
I prefer calculating the weight based on the entire dataset,
rather than on a per-batch basis, although it shouldn’t matter
if your batches are of reasonable size. The point is that the
weights aren’t magic numbers that have to be just right. They
are approximate numbers that are used to (partially) account
for having a significantly unbalanced dataset.
This would be reasonable, but the disadvantage is that you would
have to construct a new instance of BCEWithLogitsLoss for
every batch, because your weight tensor depends on the batch.
I am assuming here that by “weight tensor” you mean the tensor
you pass into BCEWithLogitsLoss's constructor as its named weight argument, e.g.:
You can now use the same criterion loss object, constructed
once, over and over again for each batch.
I would also note that if your relative weights are, in fact, 0.4
and 0.6, your dataset isn’t really very unbalanced, and I probably
wouldn’t bother using weights in the loss function.
As an aside, your shapes look a little confused. I assume that
“5 x 1 x 52 x 52 x 52” is the shape of the input to your model
(not your loss function). The shape of your label is probably 
(although it could be [5, 1]), but a shape of [1, 5] would be
I see what you mean (took me a while to figure out). so rather than passing the weight argument (which is the rescaling weight of each bach element) to the loss function we pass the pos_weight argument (which is weight of positive example). In what scenarios would you want to use one or the other ? Also, if i am calculating pos_weight argument for each batch, don’t i still have to instantiate the loss function for each batch to pass the pos_weight argument. Will something like this work:
criterion = torch.nn.BCEWithLogitsLoss()
for img, label in trainloader:
# Assuming i calculate 'pos_weights' for each batch
weights = torch.FloatTensor ([count_of_lbl0 / count_of_lbl1])
criterion.pos_weight = weights
loss = criterion(output, label)
If your sample weight only depends on the class of the sample,
you can use either. I tend to think that pos_weight is a little more
If your sample weight depends on something other than the sample’s
class – for example, if you’re “hard mining” and want to weight “hard”
samples more heavily – then you would have to use weight and
provide per-sample weights.
Also, if you are using the different, but related loss class, BCELoss
(which you generally shouldn’t be using), you will have to use weight
because, for whatever reason, BCELoss doesn’t have a pos_weight
argument for its constructor.
Yes (although, as you note below, it does appear to be possible to
modify the pos_weight property after BCEWithLogitsLoss has
been constructed). But, again, my preference is to use the same pos_weight for the whole dataset, rather than calculate it for each
I don’t see this discussed in the documentation, but it does appear to
work for me (using pytorch 0.3.0, and weight rather than pos_weight).
I would probably avoiding doing it this way because I’m not sure that
it’s officially supported.