I have a dataset that the training data are split into 4 folders with the same size of the data in each (3000 data). I wanted to create a data loader that loads only 3000 data but instead of sampling from one folder, it randomly samples from one of the sub-folders. So, I have a Dataset object that in its getitem function, I have:
def __getitem__(self, idx):
iq = int(torch.randint(0, len(self.img_address), (1,)))
img_address_ = self.img_address[iq]
ref_address_ = self.ref_address[iq]
img_address = img_address_[idx]
ref_address = ref_address_[idx]
img = self.loader(img_address)
ref = self.loader(ref_address)
return img, ref
def __len__(self):
return 3000
However, when I sample from the data loader, I get 4 samples (number of sub-folders) instead of batch size! I tested the dataset.getitem(N), and it works fine. The problem starts when I wrap the dataset in a Dataloader! I changed the number of workers, batch size, shuffle, and … but neither of them worked!
Can you please help me to find where I am doing wrong?