Hi all, I’m a newbie to machine learning and just making my first steps.
I have an imbalanced dataset, and I’m using WeightedRandomSampler to compensate for that.
I had the idea that there might be an issue with the WRS, so I had my dataset write debug data to a file so that I could analyse the sampling. It became clear that the WRS did compensate for the imbalance just fine, but at a price, namely a 6-fold reduction of the data diversity, meaning that it effectively only uses a 6th of the entire dataset. This factor 6 is no coincidence, because it is also the difference between the classes with the most data, and the classes with the least data.
Could it be that this is some sort of bug in WeigthedRandomSampler? Like I said, I am just a newbie, and this is my first model, so it is possible that I made a mistake somewhere.