Issue with stack and object detection dataloader

Hello,
I am confused by an error I am getting while implementing an object-detection dataloader. This dataloaders returns an image (as a tensor) and a dictionnary, containing a tensor of bounding boxes, and a tensor of labels. I wrote the following code (inspired from TorchVision Object Detection Finetuning Tutorial — PyTorch Tutorials 1.10.0+cu102 documentation):

class RCNNDataset(Dataset):
        
    def __init__(self, root_dir: str,
                 transforms = Normalize(mean = (0.92,), std = (0.15,)),
                 image_size:tuple=(1008,888)):
                
        self.root_dir = root_dir
        self.label_dir = os.path.join(self.root_dir, "labels")
        self.image_dir = os.path.join(self.root_dir, "images")
        self.images = sorted(os.listdir(self.image_dir))
        self.labels = sorted(os.listdir(self.label_dir))
        self.to_tensor = ToTensor()
        self.transforms = transforms
        self.image_size = image_size

    def __len__(self):
        
        return len(self.label_dir)
    
    def __getitem__(self, idx):

        image = Image.open(os.path.join(self.image_dir, self.images[idx]))
        if image.size != self.image_size:
            image = image.resize(size=self.image_size)

        image = self.to_tensor(image)

        if self.transforms is not None:

            image = self.transforms(image)

        label_array = np.loadtxt(os.path.join(self.label_dir, self.labels[idx]))
        targets = {"labels": torch.from_numpy(label_array[:,-1]),
                   "boxes": torch.from_numpy(label_array[:,0:4])}

        return image, targets

When I try to iterate through this dataloader (i.e doing images, targets = next(iter(dataloader)), I get the following error:

RuntimeError: stack expects each tensor to be equal size, but got [47] at entry 0 and [46] at entry 1

As I understand it, the stack function is expecting my tensors to always have the same shape for every element of the batch (so the same number of objects to detect in an image) ? I am very confused by this.

I would be grateful towards anybody that could help me understand this :slight_smile:

Actually You need not create any tensor for labels and boxes. The collate_fn will create the tensor for those labels and targets. Remove that torch.from_numpy() and try again.

I got the same issue today morning, the problem was I created tensor for labels.

This still doesn’t work. Even if I don’t pass a dictionnary of tensors, torch tries to use the default collate function (so torch.stack) and fails. I don’t understand because my code is basically exactly the RCNN example, and this example doesn’t implement a custom collate function.

EDIT: the PyTorch example actually does implement a custom collate function that look something like this:

def collate_fn(batch):
    return tuple(zip(*batch))  

doing dataloader = DataLoader(dataset, batch_size=bsize, shuffle=True, drop_last=True, collate_fn=collate_fn) then seems to solve my problem.