Defining random and sequiatial dataloaders - is there a data leak?

Hi, I defined in this way my two dataloaders, one random and one sequential and i’ve started to suspect there is a data leak (although not necessarily).

Could you give it a glimpse ? Do the dataloaders seem to be doing what I’ve intended without data leaks ?

Thank you in advance.

P.S I’ve adapted some existing code, so maybe it’s not the most short and elegant way to define dataloaders with this functionality, so inspecting this code is preferable, but suggestions to make it differently also will be accepted.

    #class PascalVOCLoader(data.Dataset):
    self.dataset = PascalVOCLoader(
        root_dir,
        augmentations,
        output_dim=224,
        mode='classification')

    # Creating data indices for training and validation splits:
    dataset_size = len(self.dataset)
    validation_split = 1 - ds_split
    random_seed = 42
    indices = list(range(dataset_size))
    split = int(np.floor(validation_split * dataset_size))
    if shuffle:
        np.random.seed(random_seed)
        np.random.shuffle(indices)
    train_indices, val_indices = indices[split:], indices[:split]

    self.seq_train_subset = Subset(self.dataset, train_indices)
    self.seq_test_subset = Subset(self.dataset, val_indices)

    random_train_sampler = SubsetRandomSampler(train_indices)
    random_valid_sampler = SubsetRandomSampler(val_indices)

    random_train_loader = torch.utils.data.DataLoader(
        self.dataset,
        batch_size=self.batch_size['train'],
        num_workers=0,
        sampler=random_train_sampler,
        collate_fn=my_collate)
    random_validation_loader = torch.utils.data.DataLoader(
        self.dataset,
        num_workers=0,
        batch_size=self.batch_size['test'],
        sampler=random_valid_sampler)

    sequential_train_sampler = SequentialSampler(self.seq_train_subset)
    sequential_valid_sampler = SequentialSampler(self.seq_test_subset)

    sequetial_train_loader = torch.utils.data.DataLoader(
        self.seq_train_subset,
        batch_size=self.batch_size['train'],
        num_workers=0,
        sampler=sequential_train_sampler)
    sequetial_validation_loader = torch.utils.data.DataLoader(
        self.seq_test_subset,
        num_workers=0,
        batch_size=self.batch_size['test'],
        sampler=sequential_valid_sampler,
        collate_fn=my_collate)

I don’t see any way this code would leak memory, so would assume it’s fine.

1 Like