How to deal with missing labels?

I have created a large fashion dataset (pictures downloaded from shopping websites). For each picture I know at least one item of clothing, but usually the person is wearing many items. So, for example, I might have a picture of a man wearing a t-shirt, a belt, jeans, socks and shoes. This picture has the label ‘t-shirt’. Another picture might show a woman in a blouse and skirt and be labelled “skirt”. The blouse is present but not labelled.

So I have a lot of pictures with the label “jeans” or “pants”, but there are as many, if not more, pictures that contain jeans or pants, but that have another label like “belt” or “shirt” or “tie” or “blouse” or “jacket”. Of course, there are many pictures that don’t contain any of those items.

My goal is to train a model that can identify all of the clothing items present in the image.

Can anyone suggest strategies for dealing with this case where many labels are missing?

You might want to look at segmentation.

https://www.kaggle.com/code/stpeteishii/torchvision-object-detection-segmentation-sample