Access Dataset object inside DataLoader

Hello everyone, I want to know if we can access the Dataset object after creation of DataLoader object.

Motivation : suppose we have a simple Dataset, -

class ExampleDataset(torch.utils.data.Dataset):
    def __init__(self, data):
        self.data = data

    def __getitem__(self, index):
        return self.data[index]

    def __len__(self):
        return len(self.data)

There is nothing tricky, - we can’t really modify any parameters here which can affect your training.
What I want to do is, -

class TrickyDataset(torch.utils.data.Dataset):
    def __init__(self, data, aug_functions):
        self.data = data
        self.current_epoch = 0
        self.light_aug = aug_functions[0]
        self.medium_aug = aug_functions[1]
        self.hard_aug = aug_function[2]

    def __getitem__(self, index):
        if self.current_epoch < 10:
           return self.light_aug(self.data[index])
        ... etc

    def update_epoch(self, epoch):
          self.current_epoch = epoch

    def __len__(self):
        return len(self.data)

So, if we have some

TrickyLoader = torch.utils.data.DataLoader(TrickyDataset(*params))

Can we modify Dataset inside during training (because I want to change the way data is being augmented during training)?

I don’t think you can although you could just create a new dataloader inside your training loop and use it.

Actually, I tried to look in the wrong place, - I looked into the docs and found no information about my question.

But when I looked into the source code of DataLoader, - I found out, that you can easily do what I have described above by just directly calling

dataloader = torch.utils.data.DataLoader(TrickyDataset(*params))
dataloader.dataset.update_epoch(epoch)

And now it works (I hope so, haven’t visualized it yet).

1 Like