Given two datasets of length 8000 and 1480 and their corresponding train and validation loaders,I would like o create a new dataloader that allows me to iterate through those loaders.
I tried using concatenate datasets as shown below
class custom_dataset(Dataset):
def __init__(self,*data_sets):
self.datasets=data_sets
def __getitem__(self,i):
return tuple(d[i] for d in self.datasets)
def __len__(self):
return min(len(d) for d in self.datasets)
new_dataset=custom_dataset(dataset,dataset1)
new_train_loader=DataLoader(new_dataset,batch_size=16,sampler=train_sampler,num_workers=2,drop_last=True)
new_val_loader=DataLoader(new_dataset,batch_size=16,sampler=val_sampler,num_workers=2,drop_last=True)
print("Training samples {},Val Samples {}".format(len(new_train_loader),len(new_val_loader)))
Training samples 74,Val Samples 18 this is the same number of samples for dataset 2
Training samples 400,Val Samples 100 for dataset 1
Since we are concatenating our dataset shouldn’t the final no of train samples be 474 and not 74?
But then when I print the lenght of the dataset it comes out to be 1480 and not 9400 why is that?
If we concatenate two datasets having 8000 and 1480 samples shouldn’t the final dataset be of length 9400 and not 1480.
Although I am able to visualize both samples from the two datasets,I somehow feel that a majority of the samples in the first dataset have not been included.
Any suggestions on what is wrong with my following code