Loss function for multilabel multiclass classification

Zillur_Rahman · April 17, 2023, 4:48pm

I am working with a multilabel multiclass classification problem. The output of the neural network is a tensor of size ([batch size, number of labels, number of class]). In my case, it is ([2,6,4]):

tensor([[[0.2287, 0.2657, 0.2634, 0.2423],
         [0.2663, 0.2563, 0.2132, 0.2642],
         [0.2989, 0.2733, 0.2064, 0.2215],
         [0.2235, 0.2659, 0.1971, 0.3135],
         [0.2344, 0.2361, 0.2968, 0.2328],
         [0.2639, 0.2191, 0.3216, 0.1954]],

        [[0.2287, 0.2657, 0.2634, 0.2423],
         [0.2663, 0.2563, 0.2132, 0.2642],
         [0.2989, 0.2733, 0.2064, 0.2215],
         [0.2235, 0.2659, 0.1971, 0.3135],
         [0.2344, 0.2361, 0.2968, 0.2328],
         [0.2639, 0.2191, 0.3216, 0.1954]]], grad_fn=<SoftmaxBackward0>)

The target size is ([2,6]) for 2 images and 6 labels. Each label can have value from 0 to 3 like

tensor([[0, 0, 2, 1, 3, 0],
        [0, 1, 3, 1, 2, 0]])

I am trying to use torch.nn.CrossEntropyLoss() but it is throwing shape mismatch error. What is the best way handle this situation? Do I have to use one-hot encoded target?

eqy · April 18, 2023, 6:35am

I’m confused by how the target does not have a batch size dimension. Are all inputs supposed to produce the same output label? If so you would need to repeat the values.

Additionally, CrossEntropyLoss would not be best for this use case as it is intended for cases where the model predicts a single class (out of many) for each example. You might want to try a different loss function or consider formulating your task differently.

sinatv52 · April 18, 2023, 7:28am

You can use one-hot encoding for your target labels to calculate loss and argmax on predicted output to calculate accuracy.

Zillur_Rahman · April 18, 2023, 4:38pm

The output I showed here is just a demo. The target size is (batch size, num labels) which is (2,6)

eqy · April 18, 2023, 8:00pm

Sorry, I misread this. You would need to convert your labels to a onehot encoding at the minimum to fix the shape mismatch:

>>> torch.nn.functional.one_hot(labels.reshape(-1, 6))
tensor([[[1, 0, 0, 0],
         [1, 0, 0, 0],
         [0, 0, 1, 0],
         [0, 1, 0, 0],
         [0, 0, 0, 1],
         [1, 0, 0, 0]],

        [[1, 0, 0, 0],
         [0, 1, 0, 0],
         [0, 0, 0, 1],
         [0, 1, 0, 0],
         [0, 0, 1, 0],
         [1, 0, 0, 0]]])
>>> torch.nn.functional.one_hot(labels.reshape(-1, 6)).shape
torch.Size([2, 6, 4])

https://pytorch.org/docs/stable/generated/torch.nn.functional.one_hot.html
If there is still an error I think you could reshape both the labels and the model output to match the output here which would have shape (batch_size*6, 4).

Zillur_Rahman · April 18, 2023, 8:05pm

Thanks. My work is multiclass multilabel as you can see. There are 6 labels and each label index can be anything between 0 to 3 as 4 is the number of class. BCELoss will work in that case?

eqy · April 18, 2023, 8:08pm

Sorry, I believe I misunderstood your original question, as reading again it seems that each example has six labels, but for each label only one value out of four is possible. I’ve updated my most recent post to attempt to account for this, but I think you could treat it as something like a multiclass classification problem where the output is [6*batch size, 4] and use CrossEntropyLoss. This should be fine as long if the weight for your 6 labels per examples is the same.

Zillur_Rahman · April 18, 2023, 11:24pm

I did try to use [6*batch size, 4] targets shape. It is working with CrossEntropyLoss. However, after a few epochs, the loss is not decreasing.

eqy · April 18, 2023, 11:45pm

I would check if the loss is decreasing on a trivially small dataset (e.g., train and test on just the first batch of data repeatedly) to verify that e.g., gradient calculation and and optimizer steps are working correctly.