Multi-label text classification

Er_Hall · December 9, 2019, 6:23pm

Hi all,

Can someone explain me what are the various strategies for solving text multilabel classification problems with Deep Learning models?

Is it right to “convert” the problem to multiclass classification problem? What I mean?

If for example I have 3 labels and an instance can belong to one, two or even three labels or a combination of these 3 labels I can convert the problem as a multiclass classification problem of 7 classes:
(A), (B), ©, (AB), (AC), (BC), (ABC).

Is it right? If not how can I handle it?

vdw · January 2, 2020, 7:27am

The problem I see with this approach is that the 7 classes are not independent. Admittedly, I cannot give a solid, “scientific-y” explanation why this causes issues.

Intuitively, I would go with a “one-vs-rest” approach – have 3 binary classifiers, one for each label A, B, and C:

Should the text be labelled with A? Yes/No
Should the text be labelled with B? Yes/No
Should the text be labelled with C? Yes/No

You only need to adjust your training data for each of the 3 classifiers accordingly.