How to SubsetRandomSampler at run time

I’m doing some permutation testing experiments, that is; iterating n times over the the model. I would like thus to randomly sample m samples from the total available samples. One way to do this is define the loader as a function, something like:

def get_loader(dataset, my_sampler_size):
    loader = torch.utils.data.DataLoader(
            dataset,
            batch_size=100,         
            num_workers=4,             
            sampler = SubSetRandSampler(torch.randint(0, len(dataset), (my_sampler_size,)) ), # SubSetRandSampler(range(1000)),
            shuffle= False, 
            pin_memory=True)
    return loader

Clearly, this function needs to be called before each iteration (permutation).

I there a neater way to do this?

I think you could write a custom sampler by deriving or reusing the SubsetRandomSampler and pass a length argument to it.
In the __iter__ method, you could use:

torch.randperm(len(self.indices), generator=self.generator).tolist()[:self.length]

Let me know, if this would work for you or if I misunderstood your use case.

My bad. What I’m trying to achieve is, for example, randomly selecting 5000 samples from a dataset that has 10000 samples. The sampler I showed above was working as intended.

'''unit-test'''
val_loader = get_loader(val_dataset)
for jj in range(5):    
    print('Iteration:', jj)
    for ii, (images, target) in enumerate(val_loader):        
        if ii<2:
            plt.imshow(images[ii,:].permute(1, 2, 0)  ); plt.show() # to see the image
            print(target[ii].item())
            
            break

For some confusion, I thought I need to put val_loader = get_loader(val_dataset) after for jj loop (could it be because I was doing this at 3AM in the morning! :slight_smile: )

But your idea seems very legit, yet I wouldn’t bother passing the indices and the length. Hence, maybe the original SubsetRandomSampler should receive/generate the all indices of the dataset while it is invoked in the DataLoaderClass; via indices = torch.arange(0,len(self.dataset)). The user then only passes the length to SubsetRandomSampler; that is if the user did not pass the indices. Yet another better idea is to pass the length of the samples one wants to get to complement 'shuffle; that is, adding another parameter to the datalodercalledlength; but if length` is not passed, shuffle works on the whole dataset.