Create skewed dataset


I’m currently looking into options on how to test NNs and wondered if there was a way to create datasets with skewed label distributions from a folder structure / dataset with the same number of pictures for every label.

I looked into including images conditionally to the dataloader based on their name but didn’t find a way to do that.

Any help or idea on how to tackle this would be highly appreciated.

I’ve worked on a tutorial for imbalanced training some time ago (and would need to update it to the latest PyTorch version) by creating an artificially imbalanced CIFAR10 dataset. You can find the code here. Maybe you could use it as a starter.

1 Like

Thank you! It’s a shame that it didn’t get merged.

1 Like

I have to admit I’ve also forgotten to check the status of it and I think I’ll have to ping someone (and update the code to PyTorch 1.0) to get it merged. :wink: