I have a dataset defined in the format:
class MyDataset(Dataset):
def __init__(self, N):
self.N = N
self.x = torch.rand(self.N, 10)
self.y = torch.randint(0, 3, (self.N,))
def __len__(self):
return self.N
def __getitem__(self, idx):
return self.x[idx], self.y[idx]
During the training, I would like to sample batches of m
training samples, with replacement; e.g. the first iteration includes data indices [1, 5, 6]
, second iteration includes data points [12, 3, 5]
, and so on and so forth. So the total number of iterations is an input, rather than N/m
Is there a way to use dataloader
to handle this? If not, what is there any other method than something in the form of
for i in range(iter):
x = np.random.choice(range(N), m, replace=False)
to implement this?