I just noticed that if I want to save a model and load to continue the training, dropout is the problem.
I save everything I could save in order to continue the training precisely.
I saved the followings:
"net": self.net.state_dict(), "optimizer": self.optimizer.state_dict(), "scheduler": self.scheduler.state_dict(), 'train_data_provider': self.train_data_provider.get_state(), 'test_data_provider': self.test_data_provider.get_state(), 'torch_random': torch.get_rng_state(), 'torch_cuda_state': torch.cuda.get_rng_state()}
But I found I still can not reload the net from epoch 10, for example, and produce the same results. Then I looked into my network, I found that the dropout layer is the fundamental problem. If I remove it, the procedure is exactly the same after loading the pre-trained weights.
So, I hope that PyTorch could make modifications such that we could save the state of the generator of dropout and load it when we use the function net.state_dict().