How to define the labels for multi-class segmentation in TORCHVISION OBJECT DETECTION FINETUNING TUTORIAL

I am trying the Object Detection Finetuing tutorial, which is very nice, smooth and helpful. I think there is a little bug in the labels, as they should mimic " labels (Int64Tensor[N]) : the label for each bounding box", or more plausibly, " labels (Int64Tensor[N]) : the label for each object". Clearly, the code works well with the Fudan dataset as it only has one object, ie person. If I am correct, then

labels = torch.ones((num_objs,), dtype=torch.int64)

should be replaced with the following:

labels = torch.as_tensor(obj_ids, dtype=torch.int64)

in

__getitem__ of class PennFudanDataset(object)

num_objs is defined as num_objs = len(obj_ids), while obj_ids is created via:

obj_ids = np.unique(mask)
# first id is the background, so remove it
obj_ids = obj_ids[1:]

so I assume your suggestion should work for your described use case.
Would you mind creating an issue here and describe the problem? :slight_smile:

1 Like

Now I think what I am proposing is also incorrect, as labels should contain, probably, the object_id or label_id and the number times each object is repeated in the mask. For example, having two classes with 3 objects of the first class and 5 object of the second class, one might need to use the following structure (background is dropped of course).

[1 1 1; 2 2 2 2 2]

I guess such labeling structure is used for evaluation using pycocotools. I have tried both the original form and the one I posted above and they both work, but I am concerned that they are both incorrect for my specific problem having 60 difference classes.

Hence, before reporting an issue, I am going to edit the title of the question and see if someone else has come up with an idea of how to deal with the labels.

Hi
Have you solved this issue?
Can’t uderstand how data should look like if I have several objects for one class

  • boxes (FloatTensor[N, 4]): the ground-truth boxes in [x1, y1, x2,y2] format, with values of x between 0 and W and values of y between 0 and H
  • labels (Int64Tensor[N]): the class label for each ground-truth box
    So the model should take N boxes of size 4, where N is the number of classes, I can’t understand how to pass several objects of one class. Please help

This worked for me:

# instances are encoded as different colors
        obj_ids = np.unique(mask)[1:] # first id is the background, so remove it     
        masks = mask == obj_ids[:, None, None] # split the color-encoded mask into a set of binary masks

        # get bounding box coordinates for each mask
        num_objs = len(obj_ids)                       
        boxes = []
        for i in range(num_objs):
            pos = np.where(masks[i])
            xmin = np.min(pos[1])
            xmax = np.max(pos[1])
            ymin = np.min(pos[0])
            ymax = np.max(pos[0])
            boxes.append([xmin, ymin, xmax, ymax])

        # convert everything into torch.Tensor
        boxes = torch.as_tensor(boxes, dtype=torch.float32)      
        area = (boxes[:, 3] - boxes[:, 1]) * (boxes[:, 2] - boxes[:, 0])
1 Like

So what happened to this path of the code?

    masks = torch.as_tensor(masks, dtype=torch.uint8)

    image_id = torch.tensor([idx])
    area = (boxes[:, 3] - boxes[:, 1]) * (boxes[:, 2] - boxes[:, 0])
    # suppose all instances are not crowd
    iscrowd = torch.zeros((num_objs,), dtype=torch.int64)

    target = {}
    target["boxes"] = boxes
    target["labels"] = labels
    target["masks"] = masks
    target["image_id"] = image_id
    target["area"] = area
    target["iscrowd"] = iscrowd

    if self.transforms is not None:
        img, target = self.transforms(img, target)

    return img, target

My bad not to post the rest (although I think your code snippet is correct).

        # instances are encoded as different colors
        obj_ids = np.unique(mask)[1:] # first id is the background, so remove it     
        masks = mask == obj_ids[:, None, None]  # split the color-encoded mask into a set of binary masks
        num_objs = len(obj_ids)                       
        boxes = []
        # find the boxes
        for i in range(num_objs):
            pos = np.where(masks[i])
            xmin = np.min(pos[1])
            xmax = np.max(pos[1])
            ymin = np.min(pos[0])
            ymax = np.max(pos[0])
            boxes.append([xmin, ymin, xmax, ymax])

        # convert everything into torch.Tensor
        boxes = torch.as_tensor(boxes, dtype=torch.float32)      
        area = (boxes[:, 3] - boxes[:, 1]) * (boxes[:, 2] - boxes[:, 0])
        
        target = {}
        target["boxes"] = boxes
        target["labels"] = torch.as_tensor(obj_ids, dtype=torch.int64) # corrected by Rawi
        target["masks"] = torch.as_tensor(masks, dtype=torch.uint8) #uint8
        target["image_id"] = torch.tensor([index]) 
        target["area"] = area
        target["iscrowd"] = torch.zeros((num_objs,), dtype=torch.int64) # suppose all instances are not crowd

        if self.transforms is not None:
              img, target = self.transforms(img, target)

        return img, target

Thanks so much. I gonna try this now and let you know. Thanks again. Between I have about 38 classes excluding the background(black) and not all images have all 38 classes in them. Does still work?

Thanks.

Absolutely. This is the case in most instance segmentation cases. Good luck.

Thanks a lot. You are gem. Will update you.

Sorry to trouble you. Would you by any chance know why am getting this error. Image below.
Step I took.

  1. I used the cold as you suggested.

  2. I called the function and passed in the path to the dataset and mask.
    dataset_sample = PersonDataset(‘C:/Users/LENOVO/measurement_model_dev/Train’)

  3. tried to print the first data using:
    img, target = dataset_sample[1]

I ended up with this error. see image.

Any Help will appreciated.

This is due to Numpy version. Works fine with me, mine is

np.__version__  
Out[13]: '1.17.4'

So, seems np.where will not subscript bool types in the future (or even now as happening with you).

Now, you don’t need to play with numpy package. Just replace

masks = mask == obj_ids[:, None, None]

with

masks = (mask == obj_ids[:, None, None]).astype(int)

and you’ll be good to go.

I’ve tried it, and I got the same result.

Thanks a lot for all the help.

  1. I tired the code but having this error:
    AttributeError: ‘bool’ object has no attribute ‘astype’ as shown in the first image.

  2. I also downgraded numpy to version 1.17.4 but had the same error.

  3. I am thinking, should I even try to print the object or try to get the values (img or target) from img, target = dataset_sample[7]?

  4. From my understanding, this line of code

masks = mask == obj_ids[:, None, None]

turns multi-colored mask image into a True or False binary mask. And then uses this line:

pos = np.where(masks[i])

to do selection based on the boolean value.

  1. I also did some reading on the np.where here: numpy.where — NumPy v1.26 Manual

and though np.where() uses 3 inputs I can’t relate it to the this:

pos = np.where(masks[i])

which seem to take in one value.

  1. My thought is to finish the entire code and see if I can train the model, maybe I am thinking it too much by trying to see what the data is inside the return from:
    img, target = dataset_sample[7].

Any suggestions? or help will be appreciated.

And I do really appreciate the all time taken to reply and help.

Thank you.

There was no need to downgrade numpy. Anyways, try this:

x=np.asarray([True, False]); 
print(x); x= x.astype(int); print(x)
[ True False]
[1 0]

It should work. I’ve managed to reproduce your errors using this naïve test:
A)

x=True; np.where(x[0])
Traceback (most recent call last):

  File "<ipython-input-42-5a36c5ff4d25>", line 1, in <module>
    x=True; np.where(x[0])

TypeError: 'bool' object is not subscriptable

B)

x=True; np.astype(x)
Traceback (most recent call last):

  File "<ipython-input-43-de5d8813d34d>", line 1, in <module>
    x=True; np.astype(x)

AttributeError: module 'numpy' has no attribute 'astype'

I think there’s a problem with the type and size of your masks.
Could you please check the type and size of your masks?

Hmmm. I think I know where the issues is from your code explanation, but don’t know how to solve it yet.

The type of the mask is ‘bool’ should be a list of ‘bool’; like in your
x = [True, False]
but in my case it’s just a ‘bool’.

I have an error when I tried to get the size with .size:

AttributeError: ‘bool’ object has no attribute ‘size’

and that is true as the type is bool and not a list.

I printed the image with pyplot as seen in the image below.

Not sure what and how 're you doing it. If you have a png image, you can get the mask from the alpha channel, then, convert it to a numpy array. Something like:

mask = image.getchannel('A') # getting the mask from the alpha channel (png image), 255 for foreground, 0 otherwise
mask = numpy.asarray(mask) 
mask = np.asarray(mask, dtype= 'int') # or uint8 
mask=mask/255 # 0 to 1 scale

Hi,

Thanks a lot. I figured it with your help. It was the image, they were not in the right format. I had to pre-process them using this tutorial here.

https://www.bulentsiyah.com/preprocessing-rgb-image-masks-to-segmentation-masks

Thanks again.

Hey,
Were you trying to do multi class semantic segmentation or instance segmentation. If you were doing semantic segmentation, could you please explain what is the purpose of getting the bounding boxes!

Thanks.

Hi, my question is quite different .

I am using detectron2 for model training with MaskRcnn and it needs the data in coco-json file so I also want to add images for training without any ground truth to reduce the false detection so how can I do it?

currently I am using labelme tool for image labelling