What kind of loss is better to use in multilabel classification?

Greetings,
I have encountered similar problem, in which the expected output can be multiple-labels where their values can be -1, 0 or 1.

Hence the output could look like

[1 0 -1 0 1]

could

nn.BCEWithLogitsLoss()

work in this case ?

I donā€™t think this approach would yield the expected behavior using nn.BCEWithLogitsLoss, as the output should contain logits which indicate, if the current class is active or not.
Could you explain a bit what the values [-1, 0, 1] stand for?

Firstly, Apologies for latent response.

It is a multilabel sentiment analysis task.
In which I thought of using:
-1: as negative for a label/ aspect of the sentiment
0: as neutral for a label/ aspect of the sentiment
+1: as positive for a label/ aspect of the sentiment

Could these values be understood as 3 different classes?
If so, it seems you are working on a multi-class classification (one active class per sample) not a multi-label classification (zero, one or multiple active classes per sample). If thatā€™s the case, you could encode these class labels as [0, 1, 2] and use nn.CrossEntropyLoss.

Yes each sentiment can be mapped onto [0,1,2] but there are multiple aspects associated with a given review.

Hence, each class of the aspects have the value between these three [0,1,2]

i.e: Given a review on movie theaters:
The movie was good, but the air conditioning in the theater was terrible. Even the attending staffs weren't hospitable.
Output:
Movie:Positive
Theater:Negative
Service:Negative

If I understand the example correctly, you could have 3 classes, which are either positive or negative, thus a multi-label classification. The negative output could be understood as e.g. class0 and the positive as class1.
I donā€™t really understand this explanation:

What would be represented by the third values and what by the values ā€œbetweenā€ these integers?

Yes @ptrblck you clearly understood the problem statement. Here I have 14 classes. which either have positive negative as well as neutral(3rd value) reviews: movie was not so good, but cant say it was bad hence three sub-classes [0,1,2]

Thanks the for the update.
I think you could use e.g. 14 different ā€œheadsā€ each predicting 3 classes.
However, I donā€™t know what would best work for this type or model, but maybe @KFrank might have a suggestion. :wink:

@ptrblck TBH I am a fairly new practitioner, can you point me to any example where we can plug in loss functions to ā€œheadsā€ as well as the prediction classes ?

Hello Indranil and @ptrblck!

Of the building-blocks Iā€™m familiar with, I think that CrossEnttopyLoss
and its ā€œK-dimensional caseā€ could work with this use case.

You have three classes, 0 = negative, 1 = neutral, and 2 = positive,
and you have 14 ā€œchannelsā€ (that I am purposely not calling ā€œclassesā€),
ā€œMovieā€, ā€œTheaterā€, Service", ā€œSnack Barā€, ā€œPriceā€, etc.

Your target would have shape [nBatch, nChannel = 14] (and
consist of the class labels, {0, 1, 2}. Your input (your model
output) would have shape [nBatch, nClass = 3, nChannel = 14],
and for each sample in a batch and for each channel would consist
of a set of three logits that give a ā€œraw scoreā€ for each of the three
classes.

Structurally, this all fits. One minor issue: We normally think of
classes as not being numerical or ordered ā€“ misclassifying a bird
as a fish is no better or worse than misclassifying it as a reptile.
Here it makes sense to say that misclassifying ā€œnegativeā€ as ā€œpositiveā€
is a bigger error than misclassifying it as ā€œneutralā€ ā€“ a bit of
information that is not captured by CrossEntropyLoss. You could
conceivably use MSELoss as your loss function to capture the
ordered structure of your three classes, although my gut tells me
this wouldnā€™t work as well. (If you wanted to get fancy ā€“ at the cost
of an extra hyperparameter ā€“ you could try adding some MSELoss
to your `CrossEntropyLoss, but that would probably be an
overcomplication.)

Best.

K. Frank

2 Likes

@KFrank Thank you for your response.

Correct if I am wrong in my understanding.
You meant to say the output should be on dimensions [nBatch, nChannel = 14] where nChannel could be a One-Hot Codded vector having its values in [0,1,2]?
And that I should apply something like CrossEntropyLoss(MSELoss()) ?

Hello Indranil!

No, what I said is correct.

Note that CrossEntropyLoss takes an input (the output of your
model) and a target that have different shapes.

The input has a class dimension, of size nClass. (This set of
nClass values consists of raw-score logits that tell you how
likely your model thinks each of the classes is.) The target, on
the other hand, does not have a class dimension, and the number
of classes shows up in your target because each target value is
an integer class label that ranges from 0 to nClass - 1.

My comment above applies to CrossEntropyLoss ā€“ the structure
is different for MSELoss. CrossEntropyLoss should be your go-to
loss function for this kind of multi-class classification problem. (I think
that itā€™s legitimate to consider exploring MSELoss, but I also think that
doing so is likely to be an unhelpful complication.)

Best.

K. Frank

Yes, @KFrank, I am not contradicting with your saying. I just wanted to understand

@ptrblck
is there any implementation in pytorch for
the loss in this paper propensity loss
https://www.researchgate.net/publication/339565700_Approaches_for_the_Improvement_of_the_Multilabel_Multiclass_Classification_with_a_Huge_Number_of_Classes

Hello @Jaideep_Valani, @ptrblck,

Have you found any Pytorch implementation of Propensity Loss?