I am loading data from multiple datasets. I have some images stored in properly labeled folders (e.g., \0 and \1), and in those cases I can use torch.utils.data.ConcatDataset after loading the lists, for example (where trans is a set of pre-defined Pytorch transformations):
l = []
l.append(datasets.ImageFolder(file_path, trans))
l.append(datasets.ImageFolder(file_path2, trans))
image_datasets = torch.utils.data.ConcatDataset(l)
img_datasets = dict()
img_datasets['train'], img_datasets['val'] = torch.utils.data.random_split(image_datasets, (round(0.8*len(image_datasets)), round(0.2*len(image_datasets)) ))
However, I am also loading images from other disparate locations using a csv file. The process here looks like this:
class MyData(Dataset):
def __init__(self, df):
self.df = df
def __len__(self):
return self.df.shape[0]
def __getitem__(self, index):
image = trans(PIL.Image.open(self.df.file_path[index]))
label = self.df.label[index]
return image, label
df = pd.read_csv(image_file_paths), names=["file_path", "label"])
mydata = MyData(df)
my_datasets = dict()
my_datasets['train'], my_datasets['val'] = torch.utils.data.random_split(mydata, (round(0.8*len(mydata)), round(0.2*len(mydata))))
So I’d like to be able to combine these datasets into a single dataloader. Any ideas for how I should go about doing this? Thanks!