I’m a beginner in PyTorch but I’ve made a data pipeline a couple of time. The way I know to split the data is, by taking indices and separating them into train and test.:
data_transforms = transforms.Compose([
transforms.Resize((50,50)),
transforms.RandomRotation(degrees=30),
transforms.RandomHorizontalFlip(),
transforms.ToTensor(),
transforms.Normalize(mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225])
])
data = ImageFolder('breast-histopathology/',transform=data_transforms)
valid_size = 0.15
test_size = 0.15
num_train = len(data)
indices = list(range(num_train))
np.random.shuffle(indices)
valid_split = int(np.floor((valid_size) * num_train))
test_split = int(np.floor((valid_size+test_size) * num_train))
valid_idx, test_idx, train_idx = indices[:valid_split], indices[valid_split:test_split], indices[test_split:]
print(len(valid_idx), len(test_idx), len(train_idx))
# define samplers for obtaining training and validation batches
train_sampler = SubsetRandomSampler(train_idx)
valid_sampler = SubsetRandomSampler(valid_idx)
test_sampler = SubsetRandomSampler(test_idx)
loaders = {
'train': torch.utils.data.DataLoader(data, batch_size=128, sampler=train_sampler),
'test': torch.utils.data.DataLoader(data, batch_size=32, sampler=test_sampler),
'valid': torch.utils.data.DataLoader(data, batch_size=32, sampler=valid_sampler),
}
However I don’t know how to do it incase I want separate transforms. This method provides one data transform for the whole dataset. Is there a way to divide dataset and specify separate transforms for each subset(eg. augmented data for train and original for validation).
P.S it can be done by making separate train and test folders using shutil
or os
but I was thinking if there’s a method in pytorch for doing so.