I’m currently loading up some data in the following way.
MNIST is a custom dataset that looks pretty much identical to the one in the official tutorial, so nothing special there.
to_dtype is a custom transform that does exactly what you would expect, and is also formatted after the official tutorial.
transform = transforms.Compose([transforms.ToPILImage(), transforms.RandomRotation(10, fill=(0,)), transforms.RandomHorizontalFlip(), transforms.RandomPerspective(), transforms.RandomAffine(10), transforms.ToTensor(), to_dtype(), transforms.Normalize((0.5,), (0.5,))]) trainset = MNIST('data/train.csv', transform=transform) N = len(trainset) split = (N - int(np.floor(N*.2)), int(np.floor(N*.2))) trainset, validset = torch.utils.data.random_split(trainset, split) trainload = DataLoader(trainset, batch_size=32, shuffle=True, num_workers=4) validload = DataLoader(validset, batch_size=32, shuffle=True, num_workers=4)
.random_split() does not return two datasets of the same type as was passed. I therefore cannot access the
transform attribute to turn it off in the validation set.
well, I can with
but that turns off transforms for the training set too.
Should I be worried about this? Is having augmented validation data bad?