How to make RandomRotation (Or any other random transform) fixed for DataLoader?

iso · July 16, 2022, 10:52pm

Hey there,
I understand that the transforms we put on the dataset only happen when we actually call the data (for example, iterating over a dataloader). However, for creating a randomized dataset of any sort this will give different inputs (for the same initial images). Is there a way to make the dataset fixed? I guess seeding before each time we loop on the dataloader is possible but I’m looking for something more robust.

Example code:

import torch
from torchvision import datasets, transforms
from torch.utils.data import DataLoader

transform = transforms.Compose([
                transforms.RandomRotation(100),
                 transforms.ToTensor()
             ])

mnist_train = datasets.MNIST("../mnist_train",download=True,train=True,
                             transform=transform )

loader = DataLoader(mnist_train,
                             batch_size=1,
                             shuffle=False)

If you will try to run this (the sum of the first input values), you should get different results each time due to the random transformation happening dynamically:

iter(loader).next()[0].ravel().sum()

I will want to pre-compute the transform somehow so the dataloader will be static.

ptrblck · July 17, 2022, 3:19am

Seeding is the right approach to let random transformations output deterministic results. Since it doesn’t seem to fit your use case you could iterate the DataLoader and store the randomly transformed samples. Afterwards you could then load these samples directly and disable all random transformations. Note that this approach would of course use more memory on your local SSD to store the data and the “randomness” would be limited to the number of epochs you’ve used to create the dataset.

iso · July 17, 2022, 8:49am

Thanks, well then I might just go with seeding it beforehand for now. Yes, this is actually not what the transformations were made for, but here’s the case of wanting to create a brand new dataset utilizing pytorch’s transforms