I am slightly confused on using weighted BCEWithLogitsLoss. My input data shape is : 1 x 52 x 52 x 52 (3D Volume) and label for each volume is either 0 or 1 and i am using a batch size of 5. So, at each epoch, input is 5 x 1 x 52 x 52 x 52 and label is 1 x 5. The way i am calculating weights is:

My question is should i calculate weights for each batch or per dataset. Also, assuming my label per batch is [0, 0, 1, 0, 1] and weight_0 is 0.4 and weight_1 is 0.6, so does the weight tensor passed to nn.BCEWithLogitsLoss will be [0.4, 0.4, 0.6, 0.4, 0.6] ?

I prefer calculating the weight based on the entire dataset,
rather than on a per-batch basis, although it shouldnâ€™t matter
if your batches are of reasonable size. The point is that the
weights arenâ€™t magic numbers that have to be just right. They
are approximate numbers that are used to (partially) account
for having a significantly unbalanced dataset.

This would be reasonable, but the disadvantage is that you would
have to construct a new instance of BCEWithLogitsLoss for
every batch, because your weight tensor depends on the batch.

I am assuming here that by â€śweight tensorâ€ť you mean the tensor
you pass into BCEWithLogitsLoss's constructor as its named weight argument, e.g.:

You can now use the same criterion loss object, constructed
once, over and over again for each batch.

I would also note that if your relative weights are, in fact, 0.4
and 0.6, your dataset isnâ€™t really very unbalanced, and I probably
wouldnâ€™t bother using weights in the loss function.

As an aside, your shapes look a little confused. I assume that
â€ś5 x 1 x 52 x 52 x 52â€ť is the shape of the input to your model
(not your loss function). The shape of your label is probably [5]
(although it could be [5, 1]), but a shape of [1, 5] would be
wrong.

I see what you mean (took me a while to figure out). so rather than passing the weight argument (which is the rescaling weight of each bach element) to the loss function we pass the pos_weight argument (which is weight of positive example). In what scenarios would you want to use one or the other ? Also, if i am calculating pos_weight argument for each batch, donâ€™t i still have to instantiate the loss function for each batch to pass the pos_weight argument. Will something like this work:

criterion = torch.nn.BCEWithLogitsLoss()
for img, label in trainloader:
# Assuming i calculate 'pos_weights' for each batch
weights = torch.FloatTensor ([count_of_lbl0 / count_of_lbl1])
criterion.pos_weight = weights
loss = criterion(output, label)

If your sample weight only depends on the class of the sample,
you can use either. I tend to think that pos_weight is a little more
convenient.

If your sample weight depends on something other than the sampleâ€™s
class â€“ for example, if youâ€™re â€śhard miningâ€ť and want to weight â€śhardâ€ť
samples more heavily â€“ then you would have to use weight and
provide per-sample weights.

Also, if you are using the different, but related loss class, BCELoss
(which you generally shouldnâ€™t be using), you will have to use weight
because, for whatever reason, BCELoss doesnâ€™t have a pos_weight
argument for its constructor.

Yes (although, as you note below, it does appear to be possible to
modify the pos_weight property after BCEWithLogitsLoss has
been constructed). But, again, my preference is to use the same pos_weight for the whole dataset, rather than calculate it for each
batch.

I donâ€™t see this discussed in the documentation, but it does appear to
work for me (using pytorch 0.3.0, and weight rather than pos_weight).

I would probably avoiding doing it this way because Iâ€™m not sure that
itâ€™s officially supported.