Using dataloader to sample with replacement

I have a dataset defined in the format:

class MyDataset(Dataset):
    def __init__(self, N):
        self.N = N
        self.x = torch.rand(self.N, 10)
        self.y = torch.randint(0, 3, (self.N,))

    def __len__(self):
        return self.N

    def __getitem__(self, idx):
        return self.x[idx], self.y[idx]

During the training, I would like to sample batches of m training samples, with replacement; e.g. the first iteration includes data indices [1, 5, 6], second iteration includes data points [12, 3, 5], and so on and so forth. So the total number of iterations is an input, rather than N/m

Is there a way to use dataloader to handle this? If not, what is there any other method than something in the form of

for i in range(iter):
    x = np.random.choice(range(N), m, replace=False)

to implement this?

Yes, you could create a custom sampler, draw the indices using your custom logic, and let the DataLoader pass these indices to the Dataset.__getitem__. You might also want to check the BatchSampler as it can pass a batch of indices to __getitem__.

1 Like