Finetuning FasterRCNN for multi-class examples

Hello,

I’m trying to fine-tune Faster RCNN with the Resnet-50 backend via Torchvision, using some references I have arrived at a getitem call which looks like the following


def __getitem__(self, index: int):

        file_name = self.file_names[index]
        records = self.data[self.data['file_name'] == file_name]
        
        image = np.array(Image.open(file_name), dtype=np.float32)
        image /= 255.0

        if self.transform:
            image = self.transform(image)  
            
        if self.mode != "test":
            boxes = records[['xmin', 'ymin', 'xmax', 'ymax']].values
            
            area = (boxes[:, 3] - boxes[:, 1]) * (boxes[:, 2] - boxes[:, 0])
            area = torch.as_tensor(area, dtype=torch.float32)

            labels = torch.ones((records.shape[0],), dtype=torch.int64)
            
            iscrowd = torch.zeros((records.shape[0],), dtype=torch.int64)
            
            target = {}

            target['boxes'] = boxes
            target['labels'] = labels
            target['image_id'] = torch.tensor([index])
            target['area'] = area
            target['iscrowd'] = iscrowd 
            target['boxes'] = torch.stack(list((map(torch.tensor, target['boxes'])))).type(torch.float32)

            return image, target, file_name
        else:
            return image, file_name

This line labels = torch.ones((records.shape[0],), dtype=torch.int64) assumes that there is only one other class and 0 in the case of Faster RCNN is reserved for the Background.

I have a dataset with multiple classes, and I am unable to figure out how to modify and pass the one hot encoded version to train the Faster RCNN model for a a multi-class scenario

Any advice would be great,

Thank you!

1 Like

Sir If I have 21 classes then what should I need to write?

Is this issue resolved? How did you fix it for multi classes? Please let me know

Object classes should be labelled sequentially, starting from 1. So, if you have 4 object classes, they should be numbered 1, 2, 3, and 4 in the labels variable mentioned by OP.

However, when creating the FasterRCNN object, the documentation for the num_classes argument says that you have to account for the background, which, by default, is class number 0.

So, in our example with 4 object classes, we actually have to pass num_classes=5 to FasterRCNN, even though we don’t have 4 classes of object annotations in our dataset.