SubsetRandomSampler is not random, right?

The way I expected the random sampler to work is that one enters the number of sample he needs, and the distribution of the resultant sampling is uniform (balanced data, i.e. uniformly distributed over all classes), which seems not to be the case. Simple example:

sample_idx = [1,2,3]
my_sampler = SubsetRandomSampler(sample_idx)
print("indices of sampler are:", my_sampler.indices)

output is:
indices in my_sampler are: [1, 2, 3]

Hence, nothing is random here.

As you can see in the implementation here, this variable just stores the list of possible indices that you provided. When the sampler is going to be using, the indices it returns will be randomized.

1 Like

Clearly, this is just a random permutation (or shuffling) of the indices.
One might still need to control the dataset balance.