I’m currently loading up some data in the following way. MNIST
is a custom dataset that looks pretty much identical to the one in the official tutorial, so nothing special there. to_dtype
is a custom transform that does exactly what you would expect, and is also formatted after the official tutorial.
transform = transforms.Compose([transforms.ToPILImage(),
transforms.RandomRotation(10, fill=(0,)),
transforms.RandomHorizontalFlip(),
transforms.RandomPerspective(),
transforms.RandomAffine(10),
transforms.ToTensor(),
to_dtype(),
transforms.Normalize((0.5,), (0.5,))])
trainset = MNIST('data/train.csv', transform=transform)
N = len(trainset)
split = (N - int(np.floor(N*.2)), int(np.floor(N*.2)))
trainset, validset = torch.utils.data.random_split(trainset, split)
trainload = DataLoader(trainset, batch_size=32, shuffle=True, num_workers=4)
validload = DataLoader(validset, batch_size=32, shuffle=True, num_workers=4)
apparently .random_split()
does not return two datasets of the same type as was passed. I therefore cannot access the transform
attribute to turn it off in the validation set.
well, I can with
validset.dataset.transform=None
but that turns off transforms for the training set too.
Should I be worried about this? Is having augmented validation data bad?