Combine ImageFolder dataset with custom dataset

CopyOfA · June 9, 2020, 5:32pm

I am loading data from multiple datasets. I have some images stored in properly labeled folders (e.g., \0 and \1), and in those cases I can use torch.utils.data.ConcatDataset after loading the lists, for example (where trans is a set of pre-defined Pytorch transformations):

l = []
l.append(datasets.ImageFolder(file_path, trans))
l.append(datasets.ImageFolder(file_path2, trans))
image_datasets = torch.utils.data.ConcatDataset(l)

img_datasets = dict()
img_datasets['train'], img_datasets['val'] = torch.utils.data.random_split(image_datasets, (round(0.8*len(image_datasets)), round(0.2*len(image_datasets)) ))

However, I am also loading images from other disparate locations using a csv file. The process here looks like this:

class MyData(Dataset):
  def __init__(self, df):
      self.df = df

  def __len__(self):
      return self.df.shape[0]

  def __getitem__(self, index):
      image = trans(PIL.Image.open(self.df.file_path[index]))
      label = self.df.label[index]

      return image, label


df = pd.read_csv(image_file_paths), names=["file_path", "label"])
mydata = MyData(df)

my_datasets = dict()
my_datasets['train'], my_datasets['val'] = torch.utils.data.random_split(mydata, (round(0.8*len(mydata)), round(0.2*len(mydata))))

So I’d like to be able to combine these datasets into a single dataloader. Any ideas for how I should go about doing this? Thanks!

CopyOfA · June 9, 2020, 8:37pm

Found the solution; just need to use multiple passes of ConcatDataset:

l = []
l.append(datasets.ImageFolder(file_path, trans))
l.append(datasets.ImageFolder(file_path2, trans))
image_datasets = torch.utils.data.ConcatDataset(l)

df = pd.read_csv(image_file_paths), names=["file_path", "label"])
mydata = MyData(df)

image_datasets = torch.utils.data.ConcatDataset([image_datasets, mydata])

img_datasets = dict()
img_datasets['train'], img_datasets['val'] = torch.utils.data.random_split(image_datasets, (round(0.8*len(image_datasets)), round(0.2*len(image_datasets))))

Good to go from there.