Does Pytorch have an under sampler?

I am dealing with extremely imbalance data (neg 99%, pos 1%).

I used a WeightedRandomSampler to solve this problem.
Below is the WRS code I used.

        targets = []
        for _, target in traindataset:
            if target.max()==1:
        weights = neg/pos
        pos_weight = torch.tensor(weights)
        targets = torch.stack(targets).long()
        class_sample_count = [neg,pos]
        weights = 1/torch.tensor(class_sample_count, dtype=torch.float)
        samples_weights = [weights[t] for t in targets]
        sampler =, num_samples=len(samples_weights), replacement=True)
        data_loader =, batch_size = batch_size, shuffle=False,sampler=sampler, num_workers=0, pin_memory=False)

When using WRS, the train accuracy exceeded 90% in 4 epochs, but the valid accuracy did not exceed 10%.

In order to know the cause, I checked the train image with WRS applied in each batch, and confirmed that the duplicate of pos data was serious because it was extremely imbalanced.

Because of this overfitting, I wonder if there is a smapler that undersampling neg data, leaving pos data intact.

Was the batch even imbalanced after using the WeightedRandomSampler or do I misunderstand the sentence?

Under- or oversampling can be done by changing the weights for each sample.
E.g. if you are using 1/class_count_X for classX, you would try to balance the batch.
However, you could of course scale these weights to force some undersampling (and reducing the num_samples argument).

It meant that a lot of duplicates were made because there was too little data to oversampling.
With WRS, the class imbalance problem was solved well!


I was able to undersampling as much as I wanted through this part!

Thanks a lot!