MultiClass-MultiTarget Classification CheXpert

Ivan_A · October 28, 2021, 6:49pm

Good morning everyone, I’m working with the CheXpert data set that contain l 14 classes (‘No Finding’, ‘Expanded Cardiomediastinum’, ‘Cardiomegaly’, ‘Lung opacity’, ‘Lung injury’, ‘Edema’, ‘Consolidation’ , ‘Pneumonia’, ‘Atelectasis’, ‘Pneumothorax’, ‘Pleural effusion’, ‘Other pleural’, ‘Fracture’, ‘Supportive devices’), each class can have any of the following three labels 0 negative, 1 positive and 2 uncertain.

I don’t know how to do the correct Hot encoding, can someone give me some advice?, I’m using the loss function BCEWithLogitsLoss, thank you very much.

tom · October 28, 2021, 7:45pm

You could map 2 to a made-up probability (0.5 or maybe the fraction positive / (positive + negative)).
Alternatively, you could mask the uncertain labels (i.e. the network can predict what it wants there).
Then you have one target and one label per class, just like BCEWithLogitsLoss wants.

Best regards

Thomas

Ivan_A · November 2, 2021, 8:07pm

Thank you very much for your answer Thomas V.

The article [1] shows different ways to deal with this problem, one of these is:

U-ones, where the values of Uncertain (2) are converted to ones (1).

U-zeros, where the values of Uncertain (2) are converted to zeros (0).

U-Multiclass, where the values of Uncertain (2) are taken as a single class.

For multiclass the way I approach it is by applying the following code (One-hot Encoding).

labels = torch.tensor([0., 0., 1., 1., 0., 2., 0., 2., 0., 0., 0., 0., 0., 1.])
labels = labels.type(torch.int64)

labels = labels.unsqueeze(0)
target = torch.zeros(labels.size(0), 14).scatter_(1, labels, 1.)

After training the results are good for U-zeros and U-ones, but they are not good for U-Multiclass, (it is necessary to mention that the set of images for testing does not contain Uncertain (2) labels).

The following figure shows the results obtained for 5 of the 14 classes.

Screenshot from 2021-11-02 15-00-55

If we look at the One Hot Encoding tensor,

labels = [0., 0., 1., 1., 0., 2., 0., 2., 0., 0., 0., 0., 0., 1.]
One_Hot_encoding_labels = [1., 1., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]

It only mentions that there are three different possible labels 0,1,2, but it does not correctly represent the other pathologist, so is the One-Hot Encoding done correctly?

[1] Jeremy Irvin and Pranav Rajpurkar and Michael Ko and Yifan Yu and Silviana Ciurea-Ilcus and Chris Chute and Henrik Marklund and Behzad Haghgoo and Robyn Ball and Katie Shpanskaya and Jayne Seekins and David A. Mong and Safwan S. Halabi and Jesse K. Sandberg and Ricky Jones and David B. Larson and Curtis P. Langlotz and Bhavik N. Patel and Matthew P. Lungren and Andrew Y. Ng : CheXpert: A Large Chest Radiograph Dataset with Uncertainty Labels and Expert Comparison (2019), arXiv.

tom · November 3, 2021, 3:48am

I think something is messed up with your one-hot encoding.
How about

labels = torch.tensor([0., 0., 1., 1., 0., 2., 0., 2., 0., 0., 0., 0., 0., 1.], dtype=torch.int64)
target = torch.nn.functional.one_hot(labels)

or

labels = torch.tensor([0, 0, 1, 1, 0, 2, 0, 2, 0, 0, 0, 0, 0, 1], dtype=torch.int64)
target = torch.nn.functional.one_hot(labels)