Faster-Rcnn fine tuning using wider face dataset

Loay_Sharaky · July 11, 2020, 6:40pm

when I try to finetune the pytorcfaster-rcnn following the official tutorial: https://pytorch.org/tutorials/intermediate/torchvision_tutorial.html with the Wider-Face dataset.

I got: RuntimeError: input and output sizes should be greater than 0, but got input (H: 1024, W: 1) output (H: 800, W: 0)

here is the Dataset getitem function I used:

def __getitem__(self,idx):
        img_path = self.imgs[idx]
        ann_path = self.anns[idx]
        img_id = self.images_id[idx]

        img = Image.open(img_path).convert('RGB')
        img = np.asarray(img)
        img = img/255.0
        img = np.moveaxis(img,2,0)
        img = torch.from_numpy(img).float()

        boxes ,w,h= self.__extract_boxes__(ann_path)
        masks = np.zeros([h,w,len(boxes)],dtype='uint8')
        labels=list()
        for i in range (len(boxes)):
            box = boxes[i]
            
            row_s,row_e = box[1],box[3]
            col_s,col_e = box[0],box[2]

            masks[row_s:row_e,col_s:col_e,i]=1
            labels.append(1)
        

        boxes    = torch.as_tensor(boxes,dtype=torch.float32)
        masks    = torch.as_tensor(masks,dtype=torch.uint8)
        labels   = torch.ones((len(labels),),dtype=torch.int64)
        image_id = torch.tensor([idx])
        area     = (boxes[:, 3] - boxes[:, 1]) * (boxes[:, 2] - boxes[:, 0])
        
        target = {}
        target['boxes']=boxes
        target['labels'] = labels
        target['masks']=masks
        target['image_id']=image_id
        target['area']=area
        
        return img,target

ptrblck · July 13, 2020, 12:30am

Could you print the shapes of all tensors you are using to compute the prediction as well as to calculate the loss?
While the output width seems to be 0, the input width also looks suspicious as it’s only a single row.

Loay_Sharaky · July 13, 2020, 6:38pm

@ptrblck
Thank you for your reply. I think I solved this problem, and the error is in the mask tensor
mask shape has to be in [N,H,W], but from the above code it arranged in [H,W,N].