Issues with

I’m not sure, if the issue is solved, so let me know if you are stuck.
Since df_labels contains the targets, you should be able to use them in train_test_split to create the split indices and create the datasets via Subsets.

1 Like

I tried to use Subsets like this to split to training and validation

       training_set =, (range(0, len(dataset), 2)))
        validation_set =, (range(1, len(dataset), 2)))

my bad , I got no idea how to use train_test_split on a pytorch dataset.
i got only with random_splits and Subset to split dataset ,

It is working

but im worrying if there was unbalanced on validation which would result bad prediction.
Also which should you recommend to use Subset or random_splits to split dataset.
Thankyou for helping :slight_smile:

How do you use different transforms for the results of random_split? For example, I have

from import DataLoader, random_split
from torch import Generator
from torchvision.transforms import ToTensor
from torchvision.datasets import ImageFolder


# Download and load the training data
dataset_all = ImageFolder(

size_all = len(dataset_all)
print(f'Before splitting the full dataset into train and test: len(dataset_all)={size_all}')

size_test = int(size_all * TEST_RATIO)
size_train = size_all - size_test

dataset_train, dataset_test = random_split(dataset_all, [size_train, size_test], generator=Generator().manual_seed(SEED))
print(f'After splitting the full dataset into train and test: len(dataset_train)={len(dataset_train)}. len(dataset_test)={len(dataset_test)}')

What if I want to use ColorJitter for train but not for test?

For your use case I would probably use Subsets and pass the indices explicitly as seen in this example as it would allow you to keep the specified transformations.