Handling unbalanced dataset


I’m training SSD with ResNet as a base network, it’s an object detection task. Currently I’m experiencing issue because lots of my input images are actually without any object (just a background or not labeled objets). My dataset contain about 21000 images and only 17% contains object of interest.

I’m facing strong overfitting issue. Do you guys have any idea how to solve it? I’ve tried several approaches, but let’s hear your opinion first :slight_smile:

Thanks a lot!

Could you describe your dataset more? I’m going to assume that by “background or not labeled objects”, you mean images that are labeled as “no object”.

Overfitting can be caused by a few issues, I would perhaps suggest balancing your dataset such that the number of images that are background images are balanced with images containing objects. However, as a result you may not have enough data to train on. You could then use a model that was pretrained on a similar task and fine-tune it with your training set.

That would be my immediate thought, but there’s other ways of tackling this issue.

They are labeled, but no objects are on them. My model should output no “boxes” in these cases. For example let’s assume my dataset contains K images and only small percent of them actually have cats on them. The rest of dataset contains random images of other objects, animals etc. And my model should properly detect box where cat is. Is this helpful description? @ayalaa2
Thanks :slight_smile: