Greetings,
I have encountered similar problem, in which the expected output can be multiple-labels where their values can be -1, 0 or 1.
Hence the output could look like
[1 0 -1 0 1]
could
nn.BCEWithLogitsLoss()
work in this case ?
Greetings,
I have encountered similar problem, in which the expected output can be multiple-labels where their values can be -1, 0 or 1.
Hence the output could look like
[1 0 -1 0 1]
could
nn.BCEWithLogitsLoss()
work in this case ?
I donāt think this approach would yield the expected behavior using nn.BCEWithLogitsLoss
, as the output should contain logits which indicate, if the current class is active or not.
Could you explain a bit what the values [-1, 0, 1]
stand for?
Firstly, Apologies for latent response.
It is a multilabel sentiment analysis task.
In which I thought of using:
-1
: as negative for a label/ aspect of the sentiment
0
: as neutral for a label/ aspect of the sentiment
+1
: as positive for a label/ aspect of the sentiment
Could these values be understood as 3 different classes?
If so, it seems you are working on a multi-class classification (one active class per sample) not a multi-label classification (zero, one or multiple active classes per sample). If thatās the case, you could encode these class labels as [0, 1, 2]
and use nn.CrossEntropyLoss
.
Yes each sentiment can be mapped onto [0,1,2]
but there are multiple aspects associated with a given review.
Hence, each class of the aspects have the value between these three [0,1,2]
i.e: Given a review on movie theaters:
The movie was good, but the air conditioning in the theater was terrible. Even the attending staffs weren't hospitable.
Output:
Movie:Positive
Theater:Negative
Service:Negative
If I understand the example correctly, you could have 3 classes, which are either positive or negative, thus a multi-label classification. The negative output could be understood as e.g. class0 and the positive as class1.
I donāt really understand this explanation:
What would be represented by the third values and what by the values ābetweenā these integers?
Yes @ptrblck you clearly understood the problem statement. Here I have 14 classes. which either have positive negative as well as neutral(3rd value) reviews: movie was not so good, but cant say it was bad
hence three sub-classes [0,1,2]
Thanks the for the update.
I think you could use e.g. 14 different āheadsā each predicting 3 classes.
However, I donāt know what would best work for this type or model, but maybe @KFrank might have a suggestion.
@ptrblck TBH I am a fairly new practitioner, can you point me to any example where we can plug in loss functions to āheadsā as well as the prediction classes ?
Hello Indranil and @ptrblck!
Of the building-blocks Iām familiar with, I think that CrossEnttopyLoss
and its āK-dimensional caseā could work with this use case.
You have three classes, 0 = negative, 1 = neutral, and 2 = positive,
and you have 14 āchannelsā (that I am purposely not calling āclassesā),
āMovieā, āTheaterā, Service", āSnack Barā, āPriceā, etc.
Your target
would have shape [nBatch, nChannel = 14]
(and
consist of the class labels, {0, 1, 2}
. Your input
(your model
output) would have shape [nBatch, nClass = 3, nChannel = 14]
,
and for each sample in a batch and for each channel would consist
of a set of three logits that give a āraw scoreā for each of the three
classes.
Structurally, this all fits. One minor issue: We normally think of
classes as not being numerical or ordered ā misclassifying a bird
as a fish is no better or worse than misclassifying it as a reptile.
Here it makes sense to say that misclassifying ānegativeā as āpositiveā
is a bigger error than misclassifying it as āneutralā ā a bit of
information that is not captured by CrossEntropyLoss
. You could
conceivably use MSELoss
as your loss function to capture the
ordered structure of your three classes, although my gut tells me
this wouldnāt work as well. (If you wanted to get fancy ā at the cost
of an extra hyperparameter ā you could try adding some MSELoss
to your `CrossEntropyLoss, but that would probably be an
overcomplication.)
Best.
K. Frank
@KFrank Thank you for your response.
Correct if I am wrong in my understanding.
You meant to say the output should be on dimensions [nBatch, nChannel = 14]
where nChannel
could be a One-Hot Codded vector having its values in [0,1,2]
?
And that I should apply something like CrossEntropyLoss(MSELoss())
?
Hello Indranil!
No, what I said is correct.
Note that CrossEntropyLoss
takes an input
(the output of your
model) and a target
that have different shapes.
The input
has a class dimension, of size nClass
. (This set of
nClass
values consists of raw-score logits that tell you how
likely your model thinks each of the classes is.) The target, on
the other hand, does not have a class dimension, and the number
of classes shows up in your target because each target value is
an integer class label that ranges from 0
to nClass - 1
.
My comment above applies to CrossEntropyLoss
ā the structure
is different for MSELoss
. CrossEntropyLoss
should be your go-to
loss function for this kind of multi-class classification problem. (I think
that itās legitimate to consider exploring MSELoss
, but I also think that
doing so is likely to be an unhelpful complication.)
Best.
K. Frank
Yes, @KFrank, I am not contradicting with your saying. I just wanted to understand
@ptrblck
is there any implementation in pytorch for
the loss in this paper propensity loss
https://www.researchgate.net/publication/339565700_Approaches_for_the_Improvement_of_the_Multilabel_Multiclass_Classification_with_a_Huge_Number_of_Classes
Hello @Jaideep_Valani, @ptrblck,
Have you found any Pytorch implementation of Propensity Loss?