BCEWithLogitsLoss with weights causes runtimeError

Hello,
first of all let me say that I have seen different issues related to this problem however I cannot get it to work for my use case.

I am using a resnet network with a single output and therefore trying to use BCEWithLogitsLoss for binary classification (1 or 0). This is some code snippets:

THE LOSS FUNCTION

...
loss_fn = nn.BCEWithLogitsLoss(weight=weight)
...

WEIGHT TENSOR

tensor([7.2039, 0.5373], dtype=torch.float64)
weight size torch.Size([2])

MODEL

class Net2(nn.Module):

    def __init__(self):
        super().__init__()
        self.network = models.resnet152(pretrained=False)
        num_ftrs = self.network.fc.in_features
        self.network.fc = nn.Linear(num_ftrs, 1)
    def forward(self, xb):
        return self.network(xb)

TRAINING SNIPPET

        prediction_logist = model(input).squeeze()
        predicted_probability = torch.sigmoid(prediction_logist) 
        prediction = torch.round(predicted_probability)
        target = target.type_as(prediction_logist)        
        loss = loss_fn(prediction_logist, target)

ERROR

return torch.binary_cross_entropy_with_logits(input, target, weight, pos_weight, reduction_enum)
RuntimeError: The size of tensor a (3) must match the size of tensor b (2) at non-singleton dimension 0

BATCH SIZE = 3

OUTPUT and TARGET/LABELS SIZE

torch.Size([3]) torch.Size([3])

the issue is obviously related to the the weights being a tensor of size 2 while target a tensor of size 3 = to the batchsize… things work for batchsize = 2 or no weights.

What do I need to do to work with a generic batchsize ?

Thanks you.
Giuseppe.

unsqueeze the model output and target to have a shape of [batch_size, 1] and rerun your code.

Hello, thank you for your reply.
I added

        prediction_logist=prediction_logist.unsqueeze(1)
        target=target.unsqueeze(1)

before the loss function and the Size of output is now torch.Size([3, 1]) torch.Size([3, 1]

however now the issue moved to a different dimension and I now get

RuntimeError: output with shape [3, 1] doesn't match the broadcast shape [3, 2]

I assume the broadcast share has a 2 because of the weights dimension,

Thanks.

You are right and I misread the weight argument as pos_weight.
Based on the docs the weight will rescale the loss of each batch element and should thus have the same shape as the batch size.
Unsure, but you might want to use pos_weight instead which is used to balance e.g. an class imbalance in the dataset and can be defined as pos_weight = nb_negative_examples/nb_positive_examples.

Hello,
Thanks for your update and took sometime to validate things as I am currently training the model.
In a nutshell this is what I have done:

#pos_weight = nb_negative_examples/nb_positive_examples
pos_weight=torch.tensor(posweight)
pos_weight = pos_weight.to(device)
loss_fn = nn.BCEWithLogitsLoss(pos_weight=pos_weight)

For the training loop I had to do a lot of squeeze and unsqueeze but things worked in the end.

modeloutput = model(input)
prediction_logist=modeloutput.squeeze()
predicted_probability = torch.sigmoid(prediction_logist) # calculate probabilities
predicted = torch.round(predicted_probability)
target = target.type_as(prediction_logist)

if datasize==1:
	prediction_logist = torch.tensor([prediction_logist])
	predicted= torch.tensor([predicted])

prediction_logist=prediction_logist.unsqueeze(1)
target=target.unsqueeze(1)

loss = loss_fn(modeloutput, target)

2 Things to highlight:

  1. I had to change logic when the batch size is == 1 as the dimension was not correct.
if datasize==1:
        prediction_logist = torch.tensor([prediction_logist])
        predicted= torch.tensor([predicted])

  1. I am impressed with the performace of this model compared when I used cross entropy and 2 multilabels.
    It took 200 EPOCHs before and not quite happy with the result while now with BCE modification I am getting good results even after 30 EPOCHs. Why is that ?

Thanks