I am trying to build multi-label classification model using pascal-2007 dataset. There are 20 classes. The accuracy of my model first seemed good, but later I found out that model learned to predict values close to 0 for all the classes since binary cross entropy (with logits) awarding predicting absence of classes too, and I suspect that since, for a given image, there is more absent classes than ones that are in the picture, loss reduces when model predict none of them in the picture.
I tried to give more weight to the present classes but it didn’t work well. Everybody seems to be happy with just binary cross entropy losses, is it just me having this problem ? And if it is, can someone explain to me how bce losss overcome this problem, because for me it is logical for it to pick those in this case. Also how can I overcome this ?