How to split test and train data keeping equal proportions of each class?

You wouldn’t need to create a dataframe and split it, but could use train_test_split on the indices directly.
Once you have the training, validation, and test indices, you could then create Subsets by using the Dataset with the corresponding indices.
I think this is the cleanest approach, as it wouldn’t try to reimplement already working methods such as sklearn's train_test_split.

1 Like