SubsetRandomSampler with having same batch every epoch?

I train on 10% percentage of my data. I use subset randomsampler but it seems that it select different subsets each epoch, How can I fix the same subset each epoch

subset_idx = torch.arange(1000) # 10%
train_dl = DataLoader(MyDataset(X_train_Si,y_train_Si), batch_size=hparams[‘batch_size’],sampler=my_sampler,drop_last=False)

This shouldn’t be the case, as SubsetRandomSampler just uses the permuted indices as seen here.

1 Like

I agree with you, but I selected single batch and printed the values of this batch each epoch and they weren’t similar, that’s why I was surprised.

Were you printing the data tensors of each batch and are you using any random transformations?
How did you verify that the subsets differ?

I don’t use any random transformation. First the batch has single tensor and for each epoch I do this:

I’m not sure to understand the use case completely.
If you are calling this line of core once per epoch, you would only get a single batch.
On the other hand, if you are calling it in each iteration, you are recreating the Iterator every time.

Could you try to iterate the DataLoader directly via:

for data, target in loader:

and check the results?

1 Like

Thanks, it works, I think that the I was using for extracting single batch was taking single batch from whole data and not restricted to the subset sampler, when I tried :for data, target in loader: it is the same each epoch