Singlelabel and Multilabel text classification by a LSTM

Huseyin · October 20, 2020, 7:26am

Hello,

I have a problem where i would like to predict single class “d” [000001] and multilabel [ “d”,“z”] [010100] class at the same time in a classifier with LSTM. So I mean my final Network will be able to predict both single label and multilabel class.

I want to know what would be the best aproach to this problem. Because I have seen either Single label or Multilabel Classifiers so far.

EXAMPLE: a Classifier with LSTM

sequence 1 --> Target1
[“a” , “b” , “c”] = “d”
“[000001]” , “[000010]” , “[000100]” = “[001000]”

sequence 2 --> Target 2
[“a”, “b”, “y”] = [“d”, “z”]
“[000001]” , “[000010]” , “[010000]” = “[101000]”

Many Thanks

Abhilash_Srivastava · October 20, 2020, 8:53am

It’s hard to understand what exactly you’re trying to do here? Better explain more.
Are you trying to create n binary classifiers?

Huseyin · October 20, 2020, 2:29pm

Hi @Abhilash_Srivastava I have updated the description more in Detail, I hope it is more clear.
I dont want to have n binary classifiers. Only one Classifier using LSTM, which will be capable of computing single and multilabel classification.

KFrank · October 20, 2020, 3:50pm

Hello Hüseyin!

Some clarifying comments:

A multi-label, multi-class classifier should be thought of as n binary
classifiers that all run together in a single network in single pass.

The predicted output is (logits / probabilities) predictions for a class-“0”
binary classifier, yes vs. no, class-“1”, yes vs. no, and so on.

Not commenting specifically on LSTM:

Single-label and multi-label classifiers are somewhat different. How
do you propose to build one that can do both? How would you tell
it that you want a single-label vs. multi-label output?

(One possibility is that your network has separate outputs for both
the single-label and multi-label prediction, and you train on a loss
that is a sum of a single-label and multi-label loss function.)

One thing you can do that does make sense is have the upstream
“backbone” of your network be “pre-trained” to find features that are
relevant for your classes, and then build two separate networks for
each of which you train a couple of downstream layers, once set for
the single-label case, and a second set for the multi-label case.

Best.

K. Frank

Huseyin · October 21, 2020, 12:26pm

Hi Frank,

Thank you for your proposals.

(One possibility is that your network has separate outputs for both
the single-label and multi-label prediction, and you train on a loss
that is a sum of a single-label and multi-label loss function.)

This means that instead of computing n binary classification, I should have two branches (two output layer), where one computes the multilabel output loss and the other branch computes the single binary output loss. Then combining the both loss will get me the total loss. Just for approval

Thanks

KFrank · October 21, 2020, 2:18pm

Hello Hüseyin!

I think what you propose is correct, but I wouldn’t say it quite this
way.

Let’s say you have nClass classes, and you want to perform both
single-label and multi-label multi-class classification.

In order to have two “branches” you could have a final Linear layer
with 2 * nClass outputs. The first nClass of those would be your
single-label branch, and would be fed into CrossEntropyLoss with
a single integer class label (per sample in your batch) as the target
for CrossEntropyLoss.

The second set of nClass output values would be your multi-label
branch, and would be fed into BCEWithLogitsLoss with a set
nClass per-class probabilities (which could just be 0.0 and 1.0
labels) as the target for BCEWithLogitsLoss.

Yes, then sum the two losses together to get your total_loss, call
total_loss.backward(), and optimize as usual.

You will likely want to perform a weighted sum of your two losses to
balance the relative performance of your two types of classification,
as dictated by your specific needs.

Good luck.

K. Frank