How to define the labels for multi-class segmentation in TORCHVISION OBJECT DETECTION FINETUNING TUTORIAL

I am trying the Object Detection Finetuing tutorial, which is very nice, smooth and helpful. I think there is a little bug in the labels, as they should mimic " labels (Int64Tensor[N]) : the label for each bounding box", or more plausibly, " labels (Int64Tensor[N]) : the label for each object". Clearly, the code works well with the Fudan dataset as it only has one object, ie person. If I am correct, then

labels = torch.ones((num_objs,), dtype=torch.int64)

should be replaced with the following:

labels = torch.as_tensor(obj_ids, dtype=torch.int64)

in

__getitem__ of class PennFudanDataset(object)

num_objs is defined as num_objs = len(obj_ids), while obj_ids is created via:

obj_ids = np.unique(mask)
# first id is the background, so remove it
obj_ids = obj_ids[1:]

so I assume your suggestion should work for your described use case.
Would you mind creating an issue here and describe the problem? :slight_smile:

1 Like

Now I think what I am proposing is also incorrect, as labels should contain, probably, the object_id or label_id and the number times each object is repeated in the mask. For example, having two classes with 3 objects of the first class and 5 object of the second class, one might need to use the following structure (background is dropped of course).

[1 1 1; 2 2 2 2 2]

I guess such labeling structure is used for evaluation using pycocotools. I have tried both the original form and the one I posted above and they both work, but I am concerned that they are both incorrect for my specific problem having 60 difference classes.

Hence, before reporting an issue, I am going to edit the title of the question and see if someone else has come up with an idea of how to deal with the labels.