I’m doing some permutation testing experiments, that is; iterating n times over the the model. I would like thus to randomly sample m samples from the total available samples. One way to do this is define the loader as a function, something like:
def get_loader(dataset, my_sampler_size):
loader = torch.utils.data.DataLoader(
sampler = SubSetRandSampler(torch.randint(0, len(dataset), (my_sampler_size,)) ), # SubSetRandSampler(range(1000)),
Clearly, this function needs to be called before each iteration (permutation).
I there a neater way to do this?
I think you could write a custom sampler by deriving or reusing the
SubsetRandomSampler and pass a
length argument to it.
__iter__ method, you could use:
Let me know, if this would work for you or if I misunderstood your use case.
My bad. What I’m trying to achieve is, for example, randomly selecting 5000 samples from a dataset that has 10000 samples. The sampler I showed above was working as intended.
val_loader = get_loader(val_dataset)
for jj in range(5):
for ii, (images, target) in enumerate(val_loader):
plt.imshow(images[ii,:].permute(1, 2, 0) ); plt.show() # to see the image
For some confusion, I thought I need to put
val_loader = get_loader(val_dataset) after
for jj loop (could it be because I was doing this at 3AM in the morning! )
But your idea seems very legit, yet I wouldn’t bother passing the
indices and the
length. Hence, maybe the original
SubsetRandomSampler should receive/generate the all
indices of the dataset while it is invoked in the DataLoaderClass; via
indices = torch.arange(0,len(self.dataset)). The user then only passes the length to
SubsetRandomSampler; that is if the user did not pass the
indices. Yet another better idea is to pass the
length of the samples one wants to get to complement 'shuffle
; that is, adding another parameter to the dataloder
; but if length` is not passed, shuffle works on the whole dataset.