i have a dataset that there is a huge difference in the number of images in different classes. So i want to get a batch with equal number of images of each class. anyone knows how to do that?
You could use the WeightedRandomSampler.
The issue was also discussed in this thread.
hi,
thanks for that answer, I have the same question, but the thing is: can you guarantee that you’ll get exactly the same number of samples from each class within a batch this way ?
answering to myself, here’s the solution I came up with. It amounts to creating a loader for each class, and using all those loaders for each batch:
dataset = datasets.ImageFolder(
my_rootfolder,
transform=my_transforms)
# now create the loaders for the different classes
loaders = []
for class_name in dataset.classes:
# get the indices in the dataset that are relative to that class
idx = [
pos for pos, item in enumerate(dataset.samples)
if item[1] == dataset.class_to_idx[class_name]]
# construct the corresponding dataloader thanks to a SubsetRandomSampler
loaders += [torchdata.DataLoader(
dataset, batch_size=args.batch_size,
sampler = torchdata.sampler.SubsetRandomSampler(idx),
**kwargs)]
# now using the thing:
for items in zip(*loaders):
# items[idx] is the batch for class idx,
# ...
I have a dataset already splited manually into train, validation and test with two classes each 0 and 1
i want my network to fetch equal number of images in every batch
constrain i am not having equal number of images in each class
Train —> class 0 —> 1000, class 1 —>100
say if my batch size is 30 , i need to have 15 from class 0 and 15 from class 1 for every iteration.
how to achieve this? also i am using Vgg16 architecture in keras