How can I get better performance from a multi-label image classifier?

JoshP · November 4, 2020, 11:13am

Hi, I am trying to create a CNN to classify images with multiple labels.

Labels are of the form:

[0., 0., 0., 0., 0., 0., 0., 0., 0.5000, 0., 0., 0., 0., 0., 1., 0., 0., 0.,1., 0.]

where 0 represents no appearance of the category, 0.5 represents an appearance that is hard to identify and 1 represents a clear appearance of the category in the image.

The data is imbalanced:

I have tried training with all categories and also with the ‘person’ category removed but in both examples the model performs fairly poorly either by nearly always placing the highest probability on ‘person’ or if person is removed it tends to place the highest probability on the same few categories regardless of image:

I semi adapted the network layers from a tutorial for binary classification but I think the design is not optimal, I am hoping someone can give me some advice on a better network design. This is what I am currently using:

and this is my training loop:

I am also unsure of the best way to evaluate the model on the validation set. Currently I am calculating the euclidean distance between the output and label vector and calculating the total mean distance but I feel like there should be a better way for multi-label data.

anujd9 · November 4, 2020, 3:44pm

Hi. There are a couple of things you can use here:

If you want to use this model, you can try out. using data augmentation using transforms for training you model on more images and its variations. Additionally, you should increase the number of filters for your Conv2d layers that should help the model to learn more types of underlying features and train better. For example, increasing the out_channels in the first Conv2d layer from 12 to something like 64 and similarly for the rest of the layers. Usually the best practice is to make sure that these are powers of 2 i.e. 16, 32, 64, 128 and so on.
Instead of using this architecture, you can make use of pre-trained networks and use the technique of Transfer Learning. That should give you a pretty good accuracy.
Additionally, play around with different optimizers like Adam, Adadelta, RMSProp etc.

I hope this helps

JoshP · November 5, 2020, 8:17am

Hi, thanks for the advice. I increased the cov2d layer sizes and set the first layers kernel size to 3 and set the optimiser to Adam which seems to have increased performance.

I agree that for this task transfer learning would yield better results but I wanted to try and implement and train my own network.

anujd9 · November 5, 2020, 2:17pm

I am glad that this helped. If you are all good, can you please mark the above comment as the solution so that others with a same issue can also get some help from that ? Thanks