I am currently dealing with a large image dataset. For each epoch I am using 10000 samples and 128 batch size. in getitem I am randomly sampling images from my dataset. What can I do so that after every epoch the sampling is different from the previous epoch and not repeated?
If you create a
DataLoader using the default sampler (
RandomSampler) and specify
shuffle=True, each batch will be randomly sampled in each epoch.
Here is a small example:
data = torch.randn(10, 3, 224, 224) target = torch.arange(10) dataset = TensorDataset(data, target) loader = DataLoader( dataset, batch_size=2, shuffle=True ) for data, target in loader: print(target) for data, target in loader: print(target)
As you can see, the two loops (representing an epoch) return different batches in each run.
Actually I don’t want to have a fixed dataset and then sample random batches. After every epoch, I want to construct a random dataset. Essentially what I want is that regardless of epochs, every batch randomly picks images from a given folder having millions of images.
Based on your description, if sounds rather as if you would like to just load images randomly.
If that’s the case, just swap by
ImageFolder and the code should work nevertheless.