Dataloader stuck when adding shuffle or with bigger dataset

Tomer · November 6, 2019, 12:57pm

Hello,
My program got stuck when i increased my dataset size, i notice that me dev_loader work fine.
the only different is that he use (shuffle=False), when i removed the shuffle from the train_loader it works fine.
small datset with/without shuffle work.
big datset without shuffle work.
big datset with shuffle don’t work.
Anyone khow i can solve it?

class SliceData(Dataset):
    def __init__(self, root, transform, sample_rate=1):
        self.transform = transform

        self.examples = []
        files = list(pathlib.Path(root).iterdir())
        if sample_rate < 1:
            random.shuffle(files)
            num_files = round(len(files) * sample_rate)
            files = files[:num_files]
        for fname in sorted(files):
            img = h5py.File(fname, 'r')['data']
            num_slices = img.shape[2]
            self.examples += [(fname, slice) for slice in range(num_slices)]

    def __len__(self):
        return len(self.examples)

    def __getitem__(self, i):
        fname, slice = self.examples[i]
        with h5py.File(fname, 'r') as data:
            img = data['data'][:, : , slice, :]
            bvecs = data['bvecs'][:]
            return self.transform(img, bvecs, fname.name, slice)

JuanFMontesinos · November 6, 2019, 2:45pm

Shuffle does nothing but shuffling the indices which are gonna be called. Seems that you have troubles reading non-sequential data. Why don’t you try to change your data format?
Besides, I don’t usually use h5py but it seems you are reading the whole file in getitem, if it’s very big that may be problematic.