Multilabel classification with no label examples

What is the best way to model the multi-label classification task with no label examples:
For example, we have the following classes: [A, B, C] and our instances are assigned to one or more than one of them. However, we have also some examples without any of these labels, determined as non-findings.
Let’s say the label vectors for annotators 1, 2, and 3 look like this:
A: [1, 0, 0]
B: [0,1, 0]
C: [0, 0, 1]
AB: [1, 1, 0]
AC: [1, 0, 1]

Non-finding: [0, 0, 0, 0]
How should I formulize the BCEWithLogitsLoss?

Hi Farhad!

You would label “non-finding” as [0, 0, 0], with three yes-no labels,
one for each of your three classes.

To rephrase this a little more precisely: “our instances are assigned
to zero or one or more than one of them.”

To clarify a little further: If this were a single-label, three-class problem,
[A, B, C], but a given instance could also be “none of the above,” then
you would add a fourth class, [none, A, B, C], and perform a four-class
classification.

But for a multi-label, three-class problem, you would just use the
three classes, [A, B, C] because, with multiple labels, [0, 0, 0]
suffices to indicate “none of the above.”

Best.

K. Frank

1 Like

Hi Frank,
Thanks for reply. I think the second option should be the correct one for my problem multilabel classification. In this case how I can measure the performance? Particularly for no finding labels?

Hi Farhad!

There is no single measure that tells you everything you want to know
about multilabel-classification performance. See this brief overview in
Wikipedia’s “Multi-label classification” article:

Statistics and evaluation metrics

Different use cases will care more or less about different kinds of errors,
and, so, will prefer different performance measures.

If you care about not missing “no finding” cases, you could look at an
accuracy given by the percentage of true “no finding” cases that were
predicted (completely) correctly.

If you really care about not incorrectly predicting “not no finding” cases
as being “no finding,” you could look at a rather weird sort of “accuracy”
given by the percentage of “not no finding” cases that were predicted to
have some labels even if all of the predicted labels were wrong. (Of
course if this were your use case you would likely be better off training
a simple “no finding” vs. “not no finding” binary classifier.)

Good luck.

K. Frank