I am currently trying to retrain a classifier for Pascal Voc 2012 based on vgg11. The pretrained network loads fine. I have a fully connected layer from the 4096-dim feature vector to my 20 classes. Now I have tried to train a randomly generated dataset (where an image of random noise gets mapped to [1 0 … 0 1] and a white image gets mapped to [0 1 … 1 0]).
This part works fantastically well with multi label losses in pytorch. However in pascal voc the person class is overrepresented. This makes the network output [ 1 0 … 0 0 ] essentially classifying all images as containing a person and no other classes. Calculating accuracy for both the above cases was done as suggested in “Calculating accuracy for multi-label classification” in the forums, and it worked well.
I have looked in the forums but there is nothing about this problem, what would one need to do to be able to simply train the network on the imbalanced pascal voc 2012 data?