How to Prevent Overfitting

Ben · June 14, 2017, 8:39am

met similar problem. Seems something wrong in WeightedRandomSampler.

One more question, seems WeightedRandomSampler is similar to the weight parameter in nn.CrossEntropyLoss. Which one do you suggest to use? @smth

Thanks!

caihong1105 · June 22, 2017, 2:05pm

Can you share some code as how you do background variation for images?

Thanks so much!

achaiah · June 22, 2017, 9:36pm

Unfortunately I can’t as it is pretty specific to my project. But a good way to approach it would be to use OpenCV or something similar as it has a ton of image manipulation algorithms.

Chahrazad · September 14, 2017, 6:22am

i also encounter the same problem as @wangg12, using the above code results in running train iteration on a single batch @smth. the docs are also not clear for how to use WeightedRandomSampler with Dataloader.

smth · September 14, 2017, 2:38pm

@Chahrazad all samplers are used in a consistent way.

You first create a sampler object, for example, let’s say you have 10 samples in your Dataset.

dataset_length = 10
epoch_length = 100 # each epoch sees 100 draws of samples
sample_probabilities = torch.randn(dataset_length)
weighted_sampler = torch.utils.data.sampler.WeightedRandomSampler(sample_probabilities, epoch_length)
torch.utils.data.DataLoader(...., sampler=weighted_sampler)

mratsim · October 4, 2017, 8:27pm

Here is an example repo for a Kaggle competition. I experimented with data augmentation and weighted sampling.

Data augmentation primitives are here. They inherit from a “RandomOrder” object that composes transformations. And it is called there by a dataloader

mderakhshani · November 2, 2017, 1:43pm

Hi @smth, I have got a question about WeightedRandomSampler. when you create a DataLoader with a weighted sampler, how do you iterate over the DataLoader? I mean the for loop for iteration. It seems that we should draw samples from our DataLoader instead of iterating over it from first to end as simple DataLoader does (When sample attribute is None)! Could you please elaborate more on this issue?

will_soon · May 6, 2018, 1:20am

I have similar problem as you. Could you please show how do you solve it ?
Thanks.

surojit_sengupta · November 23, 2018, 11:07am

Hello Soumith,

Once we create a sample list of counts as ‘class_sample_count’, how does the sampler figure out which count belongs to which class and hence assigns lower weights to the dominant classes further?

Regards

PantherYan · November 26, 2018, 9:00am

I also have this concern.
Is this only work for single-label classification?
For the multi-label problems, one sample belongs to a different distribution, how to solve this?

ptrblck · November 26, 2018, 12:46pm

You have to provide the weight for each sample.
Have a look at this small example.
Basically you are assigning the weights to each sample by using the target as an index.

quazi · January 14, 2019, 3:04am

not sure if this is new in Pytorch 1.0, -which is what I’m using- but shuffle and sampler are mutually exclusive…

Alex_Ge · September 10, 2019, 4:45pm

I am commenting in this thread years later, because it is the first result that pops-up when doing a Google search. @ptrblk has a much better answer and example in another post here which worked wonders for me!
Do not use the class index weights directly, you have to transform them to samples weights!

Shubhankar · December 15, 2019, 6:23pm

This should be in an example in documentation. The documentation for sampler is not very coherent.

kavyajeetbora · January 5, 2020, 3:30pm

I get this error when i use WeightedRandomSampler and Shuffle = true
ValueError: sampler option is mutually exclusive with shuffle

ptrblck · January 5, 2020, 8:25pm

As the error message states, you can either use shuffle=True, in which case RandomSampler will be used, or you could provide a sampler manually.

These options are mutually exclusive, so you cannot provide both.

milan_kalkenings · October 4, 2021, 5:03pm

Hi all, this tutorial helped me to understand the WeightedRandomSampler