Faster-Rcnn fine tuning using wider face dataset

when I try to finetune the pytorcfaster-rcnn following the official tutorial: https://pytorch.org/tutorials/intermediate/torchvision_tutorial.html with the Wider-Face dataset.

I got: RuntimeError: input and output sizes should be greater than 0, but got input (H: 1024, W: 1) output (H: 800, W: 0)

here is the Dataset getitem function I used:

def __getitem__(self,idx):
        img_path = self.imgs[idx]
        ann_path = self.anns[idx]
        img_id = self.images_id[idx]

        img = Image.open(img_path).convert('RGB')
        img = np.asarray(img)
        img = img/255.0
        img = np.moveaxis(img,2,0)
        img = torch.from_numpy(img).float()

        boxes ,w,h= self.__extract_boxes__(ann_path)
        masks = np.zeros([h,w,len(boxes)],dtype='uint8')
        labels=list()
        for i in range (len(boxes)):
            box = boxes[i]
            
            row_s,row_e = box[1],box[3]
            col_s,col_e = box[0],box[2]

            masks[row_s:row_e,col_s:col_e,i]=1
            labels.append(1)
        

        boxes    = torch.as_tensor(boxes,dtype=torch.float32)
        masks    = torch.as_tensor(masks,dtype=torch.uint8)
        labels   = torch.ones((len(labels),),dtype=torch.int64)
        image_id = torch.tensor([idx])
        area     = (boxes[:, 3] - boxes[:, 1]) * (boxes[:, 2] - boxes[:, 0])
        
        target = {}
        target['boxes']=boxes
        target['labels'] = labels
        target['masks']=masks
        target['image_id']=image_id
        target['area']=area
        
        return img,target

Could you print the shapes of all tensors you are using to compute the prediction as well as to calculate the loss?
While the output width seems to be 0, the input width also looks suspicious as it’s only a single row.

@ptrblck
Thank you for your reply. I think I solved this problem, and the error is in the mask tensor
mask shape has to be in [N,H,W], but from the above code it arranged in [H,W,N].