I am working on an optimization algorithm. This algorithm needs to take a random data in the dataloader at each iteration, so I do not have many epoch, but I have a max iteration variable (30000 for example). However, to implement it by the easiest way, I would have access to the dataset like I have access to a list:
for i_data in range(max_iter):
data = trainloader[i_data % len(trainloader)]
but the data loader object is not iterable. Do you have a solution ? I am working with the CIFAR10 data set.
Actually because I need to stock some information (so I need to keep a fix dataset), I would prefer to have a list and to pick a random element inside. This way suits better my problem :-).
However, you can custom RandomSampler class to fit your needs. When you call your data loader, you simple use as an arg : sampler = CustomRandomSampler.
Assuming SampleByIdx is my own defined sampler.
It seems the magic in RandomSampler happens at return iter(torch.randperm(len(self.data_source)).long()).
I just don’t understand where the samples are loaded in this line.
Thank You.
P. S.
I think I fixed typos, thank you for letting me know.
It is not loaded in this line. RandomSampler class is just a tool for the Dataloader class. As I said before, if you have a look to the Dataloader class, you will find this :
if batch_sampler is None:
if sampler is None:
if shuffle:
sampler = RandomSampler(dataset)
else:
sampler = SequentialSampler(dataset)
batch_sampler = BatchSampler(sampler, batch_size, drop_last)
So your dataloader will store the data following this order (assuming you are in the sampler = RandomSampler(dataset) if condition):
torch.randperm(len(self.data_source)).long()
This is how RandomSampler works. So you want to custom the RandomSampler class in order to control how your data are loaded in your dataloader. Once it is done you will know which one you are currently working on when you enumerate through your dataloader.