Do a train vaidation split at folder level in pytorch

Currently, my data is organized as follows:

  - root
       -  class1
            -  img1.png
            -  img2.png
            - imgN.png
       - class2
            - img1.png
           - imgN.png
      - classN

Now I can load the whole dataset as:

import torchvision.datasets as dset
train_folder_dataset = dset.ImageFolder(root=self.data_path)

This is great as I can now use this in my dataset and read all image files into my training dataset. Now, I want to split this dataset into training and validatioon but at the folder level (example I want the class1 folder to be in the training set and class2 in the validation set as an example).

However, I have no idea how to split this from thee ImageFoder class. So, if I do something like:

all_dirs = dset.ImageFolder(root=self.data_path)
# Here how do I split this all_dirs into training and validation directories. Note, I want to split at the directory level and not at the image level.
#  So hoping for something like:
train_dirs, val_dirs  = spliit_folder(all_dirs)

Would something like this be possible?

I’m doing something similar, but without the dset.ImageFolder class (because I don’t have images).

What I did was preprocess into:


at the beginning of my pipeline, so all nets would have the same view on what is train/val/test