How do I calculate batches to pull full dataset when using a random weighted sampler?

Hello,

From this forum, I learned how to use the random weighted sampler to pull more of the minority class than the majority class. How would I calculate the batches correctly to make sure all of my images are used?

Here is my code for the weighted sampler:

batch_size = 10
class_sample_count = [46699, 17739] 
weights = 1 / torch.Tensor(class_sample_count)
sample_weights = weights[target]
weights = weights.double()
sampler = torch.utils.data.sampler.WeightedRandomSampler(
    weights=sample_weights,
    num_samples=len(sample_weights),
    replacement=True)
#trainloader = torch.utils.data.DataLoader(trainDataset, batch_size = batch_size, sampler = sampler)

train_loader = DataLoader(roof, batch_size=10, sampler = sampler)

So with a batch size of 10, I would run 6444 batches per epoch to get all of the images through. How would I do it with a random weighted sampler?

You want to see all images at least once?
I’m not sure you can be sure of this. as you could set a very small probability on one sample which would make it very unlikely to be drawn.

Thank you. What if I want to see the positive class at least one (the minority class). Is that essentially, what it’s doing?

I don’t think you can get such guarantees.
But you can check the doc. It will sample the number of elements you asked for with the given probabilities. So in theory anything can happen. But if you sample enough samples, you can be sure it will be enough. Note that num_samples does not have to be the size of sample_weights.

Ok. Last question then (on this topic :wink: ) Can I artificially set how I want the model to sample? So if I wanted equally sampling amongst the positive and negative class, Do Set sample_weights to [.5, .5] and that should pull equal amounts of positive/negative samples regardless if I have 3 to 1 negative over positive samples?

You cannot provide weights for specific classes right?
You need to set it for each sample. So you can give 1/num_samples to each sample to get a uniform sampling.
But with the weights you have in your first post, it should draw the same number of each classes as your weight is proportional to the number of elements available for that class.