I should train my network performing 5 fold cross validation and train each fold for 25 epochs. I have to perform data augmentation with random rotation and random translation. Starting from a dataset of 500 images I create 10 new images for each image of the original dataset by randomly translating and rotating. I have followed the tutorial here https://pytorch.org/tutorials/beginner/data_loading_tutorial.html. In each epoch the augmented dataset will be different. I would like to have the same augmented dataset for each epoch and for each fold but I do not know how I can do.
Thank you for your help
If you have to make sure the same randomly augmented images are used and each fold contains exactly the same samples, I would recommend to create this particular dataset in a separate script and store each fold as PyTorch tensors using
In your training script you could simply load the folds and train your model for the specified number of epochs.
Alternatively, it should be possible to properly seed the code and create the same folds in your training script, but you would have to be very careful and would have to verify each step to avoid accidentally calling into the random number generator (which would thus change e.g. the augmentation).
I cannot create the particular dataset since I have not sufficient memory on the disk. The dataset is augmented on the fly. In order to seed the code is it sufficient to call for example np.random.seed(value) at the beginning of each epoch?
You don’t need to create all augmented samples at once and could still use the lazy data loading approach to create the folds.
However, if you want to use the seeding approach, refer to the Reproducibility docs and note that slight changes, such as moving or executing the validation loop might call into the random number generator and could thus change the transformation and creation of the folds.