I want to split my audio dataset into train and test. can datasetLoader do this for me?

I just started coding in Pytorch. I have converted my wav files into text using glob library. But now I want to split that text file into train and test. Actually the dataset is very small and imbalanced. To be more clear it has 7 classes in file name only. But different classes have different samples like 100, 50 etc. Not How to split it into Train and Test
can anybody help plz

If you want to randomly split your Dataset, you could use torch.utils.data.random_split.
Alternatively, if you want to apply a stratified split, you could use sklearn.model_selection.train_test_split and pass the targets as the stratify argument.

1 Like

Thank you so much sir