Dropout and training reproducibility

dadaddapyer · March 14, 2022, 2:21pm

Hi,

I just noticed that if I want to save a model and load to continue the training, dropout is the problem.
I save everything I could save in order to continue the training precisely.
I saved the followings:

        "net": self.net.state_dict(),
        "optimizer": self.optimizer.state_dict(),
        "scheduler": self.scheduler.state_dict(),
        'train_data_provider': self.train_data_provider.get_state(),
        'test_data_provider': self.test_data_provider.get_state(),
        'torch_random': torch.get_rng_state(),
        'torch_cuda_state': torch.cuda.get_rng_state()}

But I found I still can not reload the net from epoch 10, for example, and produce the same results. Then I looked into my network, I found that the dropout layer is the fundamental problem. If I remove it, the procedure is exactly the same after loading the pre-trained weights.

So, I hope that PyTorch could make modifications such that we could save the state of the generator of dropout and load it when we use the function net.state_dict().

Best regards

ptrblck · March 14, 2022, 7:58pm

I don’t think the dropout layer uses an own generator or seeding but is reusing the “global” state.
If you want to reproduce deterministic behavior when starting from a specific epoch you could try to set the global seed in each epoch.