Correct use of WeightedRandomSampler?

I am wondering what is the right way to use a sampler like WeightedRandomSampler for imbalanced classification problems.

Specifically, I am unclear as to whether I only use sampling during:

  1. Training
  2. Training + Validation
  3. Training + Validation + Testing (whereby each gets its own sampler to capture the distribution in its respective data set)

I’ve poked through several threads and noticed people using it for training only and training + validation.

Thanks for your time and help!

I think the common use case would be to use it during the training only.
The validation dataset should act as a good proxy for the final model performance on the unseen test data. So I would try to keep the data distribution of the validation and test datasets as close as possible.
Since your test dataset is also imbalanced, using weighted sampling on it might not give you a proper signal how your model would perform on real world data (which should also be imbalanced).

However, that’s my biased opinion so let’s wait for some other opinions. :slight_smile:

Thank you for always responding. This makes a lot of sense!