Towards Reproducible Training Results

untitled47 · January 17, 2023, 4:44am

In Manual seed cannot make dropout deterministic on CUDA for Pytorch 1.0 preview version, it is mentioned that seeding does not help in reproducibility when a model contains modules like nn.Dropout . In that forum, a potential solution would be using torch.set_rng_state(). Here are my findings:

LeNet5 does not contain dropout layer while seeding failed to give reproducible results. Seeds are configured as below.

    np.random.seed(args.seed)
    os.environ['PYTHONHASHSEED'] = str(args.seed)

    # Set seed for pytorch
    torch.manual_seed(args.seed)  # Set seed for CPU
    torch.cuda.manual_seed(args.seed)  # Set seed for the current GPU
    torch.cuda.manual_seed_all(args.seed)  # Set seed for all the GPUs
    cudnn.benchmark = False
    cudnn.deterministic = True

When torch.nn.DataParallel is used, the results among different runs varies slightly more obvious.

I would be grateful if you could explain why these could happen.

ptrblck · January 17, 2023, 6:17am

Which PyTorch version are you using and did you follow the steps described in the reproducibility docs? Since the issue you’ve linked is quite old I would assume things have changed already and the described CUDA seeding issue was fixed.

untitled47 · January 17, 2023, 1:16pm

Hello @ptrblck,

Thank you so much for your reply.

I am using pytorch 1.13.0 and I did follow the steps shown in Reproducibility — PyTorch 1.13 documentation.

Here are some clues:

I noticed that seeding as shown above kept data sequence fixed (with dataset shuffle and random transforms on samples).
In addition, seeding itself guaranteed the model (LeNet5 in my case) to be initialized in the same way during different runs.