Problem with WeightedRandomSampler

Ronald_Lauer · August 24, 2024, 2:17pm

Hi all, I’m a newbie to machine learning and just making my first steps.
I have an imbalanced dataset, and I’m using WeightedRandomSampler to compensate for that.

I had the idea that there might be an issue with the WRS, so I had my dataset write debug data to a file so that I could analyse the sampling. It became clear that the WRS did compensate for the imbalance just fine, but at a price, namely a 6-fold reduction of the data diversity, meaning that it effectively only uses a 6th of the entire dataset. This factor 6 is no coincidence, because it is also the difference between the classes with the most data, and the classes with the least data.
Could it be that this is some sort of bug in WeigthedRandomSampler? Like I said, I am just a newbie, and this is my first model, so it is possible that I made a mistake somewhere.

Soumya_Kundu · August 24, 2024, 8:24pm

The diversity of the majority class is sacrificed, not the entire dataset (assuming you’re inversely weighed based on class size).

What you are noticing may be the result of your first batch which may not be your entire dataset.

Could you potentially show some code/prints to verify this?

Ronald_Lauer · August 25, 2024, 7:41am

Apologies, I made a silly error. There is no problem with Weighted Random Sampler after all.
It does exactly what it should do.

So there must be a different reason why my model doesn’t work. I’ll check everything again and if necessary I will start over with a somewhat simplified version and less data.