I’m not sure, if the issue is solved, so let me know if you are stuck.
Since df_labels
contains the targets, you should be able to use them in train_test_split
to create the split indices and create the datasets via Subset
s.
1 Like
I tried to use Subsets
like this to split to training and validation
training_set = torch.utils.data.Subset(dataset, (range(0, len(dataset), 2)))
validation_set = torch.utils.data.Subset(dataset, (range(1, len(dataset), 2)))
my bad , I got no idea how to use train_test_split
on a pytorch dataset
.
i got only with random_splits
and Subset
to split dataset
,
It is working
but im worrying if there was unbalanced on validation which would result bad prediction.
Also which should you recommend to use Subset
or random_splits
to split dataset.
Thankyou for helping
How do you use different transforms for the results of random_split
? For example, I have
from torch.utils.data import DataLoader, random_split
from torch import Generator
from torchvision.transforms import ToTensor
from torchvision.datasets import ImageFolder
TEST_RATIO = 0.2
BATCH_SIZE = 32
# Download and load the training data
dataset_all = ImageFolder(
data_dir,
transform=ToTensor(),
)
size_all = len(dataset_all)
print(f'Before splitting the full dataset into train and test: len(dataset_all)={size_all}')
size_test = int(size_all * TEST_RATIO)
size_train = size_all - size_test
dataset_train, dataset_test = random_split(dataset_all, [size_train, size_test], generator=Generator().manual_seed(SEED))
print(f'After splitting the full dataset into train and test: len(dataset_train)={len(dataset_train)}. len(dataset_test)={len(dataset_test)}')
What if I want to use ColorJitter for train but not for test?
For your use case I would probably use Subset
s and pass the indices explicitly as seen in this example as it would allow you to keep the specified transformations.