Does torch.utils.data.random_split split every class equally?

So I have a directory with subdirectories, each subdirectory is a class.

+Directory
--+class1
--+class2 
... etc

If I load them using torchvision.datasets.ImageFolder, and then split into training and testing like,

train_size = int(0.8*len(dataset_total))
test_size = len(dataset_total) - train_size
train_dataset,test_dataset = torch.utils.data.random_split(dataset_total,[train_size,test_size])

and then make a train_loader and test_loader for the train_dataset and test_dataset.
Could I expect an equal division of data from all classes? What is the best practice here to send data to network?

Thanks!

It is less likely to evenly split the dataset into two. It simply creates a list of permuted indices to sample the dataset into subsets, which doesn’t care about the classes. Say if random falls back to sequential, other classes are doom lost.
Check link for original code.

So how to split it into equal classes using torch.utils.data.random_split

1 Like

Personally I ended up using train_test_split from sklearn.