Is there an example for multi class multilabel classification in Pytorch?

Hello everyone.
How can I do multiclass multi label classification in Pytorch? Is there a tutorial or example somewhere that I can use?
I’d be grateful if anyone can help in this regard
Thank you all in advance

3 Likes

I know everything that’s there, and also there was not a single word on multi class multi label classification!
Have you yourself even looked at it before suggesting it?

Thanks, I didnt mean to be rude, I just genuinely wanted to know if you mistook this for some other link or not.
Anyway I appreciate your help.

Does creating different heads (3 classifiers e.g.) count as a multi task? they all have their respective losses and for backpropagating their losses are summed and then the result is backpropagated!
if this is multi-task, what is a multi-lable scenario? Is this the other name for multi label classification?
if so , what is multi-class multi-label classification?

Have a look at this post for a small example on multi label classification.
You could use multi-hot encoded targets, nn.BCE(WithLogits)Loss and an output layer returning [batch_size, nb_classes] (same as in multi-class classification).

9 Likes

Thanks, why did you use nn.BCEWithLogitsLoss() ? and not cross entropy?
Cant we use a sigmoid and a normal crossentropy to have probablities for all classes?

nn.CrossEntropyLoss uses the target to index the logits in your model’s output.
Thus it is suitable for multi-class classification use cases (only one valid class in the target).

nn.BCEWithLogitsLoss on the other hand treats each output independently and is suitable for multi-label classification use cases.

8 Likes

Thanks a lot as always. Then what about MultiLabelSoftMarginLost? Shouldn’t we use that? (I know its simplye sigmoid + BCE (Link)
I guess back in 2017 there was an issue about its numerical instability is it why you chose BCEWithLogitsLoss?

The link points to a legacy version of the loss.
This is the current implementation in the master branch.
The main difference is, that the loss will be averaged over the feature dimension:

loss = loss.sum(dim=1) / input.size(1)  # only return N loss values

Here is an older post, which compared both losses, which won’t work anymore due to the shape mismatch.

Here is the updated version:

x = torch.randn(10, 3)
y = torch.FloatTensor(10, 3).random_(2)

# double the loss for class 1
class_weight = torch.FloatTensor([1.0, 2.0, 1.0])
# double the loss for last sample
element_weight = torch.FloatTensor([1.0]*9 + [2.0]).view(-1, 1)
element_weight = element_weight.repeat(1, 3)

bce_criterion = nn.BCEWithLogitsLoss(weight=None, reduction='none')
multi_criterion = nn.MultiLabelSoftMarginLoss(weight=None, reduction='none')

bce_criterion_class = nn.BCEWithLogitsLoss(weight=class_weight, reduction='none')
multi_criterion_class = nn.MultiLabelSoftMarginLoss(weight=class_weight, reduction='none')

bce_criterion_element = nn.BCEWithLogitsLoss(weight=element_weight, reduction='none')
multi_criterion_element = nn.MultiLabelSoftMarginLoss(weight=element_weight, reduction='none')

bce_loss = bce_criterion(x, y)
multi_loss = multi_criterion(x, y)

bce_loss_class = bce_criterion_class(x, y)
multi_loss_class = multi_criterion_class(x, y)

bce_loss_element = bce_criterion_element(x, y)
multi_loss_element = multi_criterion_element(x, y)

print(torch.allclose(bce_loss.mean(1), multi_loss))
> True
print(torch.allclose(bce_loss_class.mean(1), multi_loss_class))
> True
print(torch.allclose(bce_loss_element.mean(1), multi_loss_element))
> True

Yes, and I think it could be still an issue, as logsigmoid is mathematically more stable than log + sigmoid, since internally the LogSumExp trick will be applied as seen here.

7 Likes

When I try this, I get the following error:
RuntimeError: Boolean value of Tensor with more than one value is ambiguous

Here are my logits

tensor([[-2.8443, -3.3110, -2.5216,  ..., -2.7601, -3.0928, -2.9031],
        [-2.8533, -2.9637, -2.5839,  ..., -2.3841, -2.8846, -3.0366],
        [-2.8923, -3.2757, -2.6118,  ..., -2.4875, -2.7701, -3.1466],
        ...,
        [-2.9981, -3.2178, -2.5539,  ..., -2.7732, -3.0216, -2.8305],
        [-2.7969, -3.0189, -2.4602,  ..., -2.2811, -2.9239, -3.1404],
        [-2.8644, -2.9294, -2.5960,  ..., -2.4510, -2.8790, -2.9344]],
       grad_fn=<IndexBackward>)

and labels

tensor([[0, 0, 0,  ..., 0, 1, 0],
        [0, 0, 0,  ..., 0, 1, 0],
        [0, 0, 0,  ..., 1, 0, 0],
        ...,
        [0, 0, 0,  ..., 0, 0, 0],
        [0, 0, 0,  ..., 0, 0, 0],
        [0, 0, 0,  ..., 0, 0, 1]])

tensors of shape batch by classes. What am I doing wrong?

2 Likes

Nvm. I was treating nn.BCEWithLogitsLoss as a function from torch.nn.functional and was doing nn.BCEWithLogitsLoss(logits, label). Fixed by changing to n.BCEWithLogitsLoss()(logits, label) in case anyone runs into that.

3 Likes

Hi!

I am doing multilabel classification, and came across your reply. Can you help to explain why they are different? I also looked up the documentation but it’s suggesting nn.BCEWithLogitsLoss(logits, label). Thank you!!!

torch.nn.functional.binary_cross_entropy_with_logits() is a function which will calculate the loss directly:

torch.nn.functional.binary_cross_entropy_with_logits(logits, label)

whereas nn.BCEWithLogitsLoss() initiates a class first and calls torch.nn.functional.binary_cross_entropy_with_logits() when forward is called:

criterion = torch.nn.BCEWithLogitsLoss(pos_weight=pos_weight)
criterion(output, target)

For more detailed explanations on the differences between torch.nn and torch.nn.functional, checkout this thread here.