Class imbalance with WeightedRandomSampler

I have a dataset with two classes and a severe class imbalance.

I tried to use the weighed sampler to equalize the classes for training but only class showed during training. What did i do wrong?

My dataset contains individual CSV , not images.

def csv_loader(path: str) -> torch.Tensor:
    data = np.array(pd.read_csv(path, header=None))
    sample = torch.from_numpy(data)
    return sample

train_dataset = DatasetFolder(root=train_dir, loader=csv_loader, extensions=".csv")

batch_size = config.MODEL_PARAM['BATCH_SIZE']
weights =  [0.5,0.5]
sampler = torch.utils.data.sampler.WeightedRandomSampler(weights, batch_size)
    train_data_loader = DataLoader(
        train_dataset, batch_size=config.MODEL_PARAM['BATCH_SIZE'], 
        sampler = sampler, num_workers=4, drop_last=True
    )

The weigths tensor should contain the weight for each sample in your dataset, nor the class weights only.
Have a look at this example which shows a dummy use case.

1 Like

That’s very helpful here’s the solution with a datafolder:

def sample_weight(data_folder):
    
    class_sample_count = np.array([len([i for i in data_folder.targets if i == t]) for t in range(0, len(data_folder.classes))])
    weight = 1 / class_sample_count
    samples_weight = np.array([weight[t] for t in data_folder.targets])
    samples_weight = torch.from_numpy(samples_weight)
    samples_weight = samples_weight.double()
    sampler = WeightedRandomSampler(samples_weight, len(samples_weight))
    return sampler

train_dataset = DatasetFolder(root=train_dir, loader=csv_loader, extensions=".csv")
train_sample_weight = sample_weight(train_dataset)