Behaviour of WeightedRandomSampler

aktgpt · February 29, 2020, 1:47am

I’m trying to implement stratified sampling in my dataset and calculate the weights for each sample in the dataset with the following function:

def make_weights_for_balanced_classes(labels):
unique_labels, counts = np.unique(labels, return_counts=True)
weight_per_class = np.sum(counts) / counts
weights = [0] * len(labels)
for i, val in enumerate(labels):
weights[i] = weight_per_class[np.where(unique_labels == val)[0]]
return weights

sampler = WeightedRandomSampler(weights, len(weights))
dataloader = DataLoader( dataset, batch_size=128, sampler=sampler )

But when I’m enumerating through the dataloader the error occurs in my custom dataset __getitem__:

list indices must be integers or slices, not list

I wanted to know if this is normal that you’ll get a list of indices instead of a single index when using WeightedRandomSampler. Should I change my dataset to accept the list of indices whenever I’m using WeightedRandomSampler?

ptrblck · February 29, 2020, 7:04am

Which line of code is throwing this error?
The sampler should rerun a single index and it works fine in this dummy code snippet:

dataset = TensorDataset(
    torch.arange(10).float())

weights = [0.1] * 10
sampler = WeightedRandomSampler(weights, len(weights))
loader = DataLoader(dataset, sampler=sampler)

for x in loader:
    print(x)

aktgpt · February 29, 2020, 11:08am

Thanks for the reply. I realized after seeing your code that I had one extra dimension in my weights which was why my dataset __getitem__was getting a list of indices instead of single index at a time.
Had to change
weights[i] = weight_per_class[np.where(unique_labels == val)[0]]
to
weights[i] = weight_per_class[np.where(unique_labels == val)[0][0]]
in my stratified sampling function and it works now.
Thank you.