How does WeightedRandomSampler work?

mderakhshani · September 29, 2017, 7:41pm

Hi,
I have wrote below code for understanding how WeightedRandomSampler works.

import torch
from torch.utils.data.sampler import Sampler
from torch.utils.data import TensorDataset as dset

inputs = torch.randn(100,1,10)
target = torch.floor(3*torch.rand(100))
trainData = dset(inputs, target)

num_sample = 3
weight = [0.2, 0.3, 0.7]
sampler = torch.utils.data.sampler.WeightedRandomSampler(weight, batch_size)
trainLoader = torch.utils.data.DataLoader(trainData, num_sample , shuffle=False, sampler=sampler)

for i, (inp, tar) in enumerate(trainLoader):
    print(inp.size())

I have got 100 instances in my fake dataset. When I run above code, only 10 of them sampled from dataset and also the number of iteration is 4 for my run! Could you please help me how does it work?

Best

smth · September 29, 2017, 9:56pm

it;'s a very small function, have a look at how it works: https://github.com/pytorch/pytorch/blob/master/torch/utils/data/sampler.py#L73-L90

mderakhshani · September 29, 2017, 10:00pm

Hey @smth. Thanks for your response. I have understood WeightedRandomSampler, But I did not understand what is the logic of the enumerate(trainLoader). How many times does for loop execute?

quazi · January 14, 2019, 2:45pm

Been looking at the code in DataLoader and WeightedRandomSampler, I can’t see how it takes class labels into account. From the code comment “weights (sequence) : a sequence of weights, not necessary summing up to one”. Not very helpful really for someone who’s trying to learn torch. It looks like weights is a list of weights per data point in the data set we are drawing from, NOT a weight per class (which I initially, maybe carelessly, assumed). And if that’s the case, you’d have to write code that computes this weight per data point and somehow “attach” that weight to the data point, e.g. a text data point becomes (sentence, label, weight for sampling) UNLESS some order is implied on the data set before we can use WeightedRandomSampler.

While the code is fairly straight forward, the semantics around WeightedRandomSampler are not clear at all.

lbugnon · January 31, 2019, 2:25am

Totally agreed, i’d found it today and it is counter-intuitive

cuixing158_1 · March 1, 2019, 4:18am

your understanding may be correct,pls reference here ,perhps it helps.

Albert_Christianto · December 15, 2019, 2:07pm

hi, i want to train a network using 3 dataset. i have created my custom dataset class, then i need to load 3 datasets with different ratio in 1 batch. does anyone have sample code for this application?

Alexander_Soare · November 8, 2020, 3:57pm

I agree that the docs are not specific enough here. I just spent the last 30 minutes figuring out that the weights are meant to be specified at a data point level, not a class level. In fact, I spent some time trying to figure out how WeightedRandomSampler knows what my class labels are.

Pcamellon · November 5, 2021, 2:00pm

Hi! Any update on this?

ptrblck · November 5, 2021, 11:26pm

The documentation claims:

Samples elements from [0,..,len(weights)-1] with given probabilities (weights).

I agree that this might not be a sufficiently detailed explanations for some users and I’m sure PRs are welcome to improve the docs