I’m doing some permutation testing experiments, that is; iterating n times over the the model. I would like thus to randomly sample m samples from the total available samples. One way to do this is define the loader as a function, something like:
I think you could write a custom sampler by deriving or reusing the SubsetRandomSampler and pass a length argument to it.
In the __iter__ method, you could use:
My bad. What I’m trying to achieve is, for example, randomly selecting 5000 samples from a dataset that has 10000 samples. The sampler I showed above was working as intended.
'''unit-test'''
val_loader = get_loader(val_dataset)
for jj in range(5):
print('Iteration:', jj)
for ii, (images, target) in enumerate(val_loader):
if ii<2:
plt.imshow(images[ii,:].permute(1, 2, 0) ); plt.show() # to see the image
print(target[ii].item())
break
For some confusion, I thought I need to put val_loader = get_loader(val_dataset) after for jj loop (could it be because I was doing this at 3AM in the morning! )
But your idea seems very legit, yet I wouldn’t bother passing the indices and the length. Hence, maybe the original SubsetRandomSampler should receive/generate the all indices of the dataset while it is invoked in the DataLoaderClass; via indices = torch.arange(0,len(self.dataset)). The user then only passes the length to SubsetRandomSampler; that is if the user did not pass the indices. Yet another better idea is to pass the length of the samples one wants to get to complement 'shuffle; that is, adding another parameter to the datalodercalledlength; but if length` is not passed, shuffle works on the whole dataset.