Dataloader throws error when iterating

gaurabdhg · September 28, 2023, 9:05pm

I have a custom dataset where inputs and labels are both tensors.

class atDataset(Dataset):
    def __init__(self, file_paths, seed,transform=None):
        self.file_paths = file_paths
        self.transform = transform
        self.seed=seed

    def __len__(self):
        return len(self.file_paths)

    def __getitem__(self, idx):
        file_path = self.file_paths[idx]
        data_dict = th.load(file_path)
        th.manual_seed(self.seed[idx])
        _data=list(data_dict)[0]
        _noise=th.rand_like(_data)
        
        _roll_param=th.randint(-10,10,(2,))
        _data=th.roll(_data,shifts=tuple(_roll_param),dims=(0,1))
        _label=list(data_dict.values())[0]
        
        return _data+_noise,_label

The data is in “*.pt” files, and each file contains a single dictionary {data : label}.

The data is of shape 320x320 but labels are of shape (nx3) where n varies.
Iterating on the dataloader throws:
RuntimeError: stack expects each tensor to be equal size, but got [4497, 3] at entry 0 and [3281, 3] at entry 1.
How can I resolve this without resizing or padding the tensors.

marksaroufim · September 29, 2023, 4:21am

It sounds like you need to implement a custom collate_fn() - you can see this thread for how people deal with batches of variable sizes How to create a dataloader with variable-size input - #13 by pinocchio