then in the main function I use:
model.load_state_dict(best_state)
to resume the model.
However, I found that best_state is always the same as the last state during training, not the best state. Is anyone know the reason and how to avoid it? (Version: 1.1.0, Linux, GPU)
By the way, I know I can use torch.save(the_model.state_dict(), PATH) and then load the model by
the_model.load_state_dict(torch.load(PATH)).
However, I don’t want to save the parameters to file as train and test functions are in one file.
P.S. I see that the return best_state
is also there, are you sure you are returning after several epochs? And not just after 1 epoch? Or whenever the acc > best_acc ? ( It can’t be figured out as you have no indentations in your question.)
Yeah! Since you are not saving it to a file, might be the case that @god_sp33d and @JuanFMontesinos is right, that you need to make a deepcopy. DeepCopy should work, if you have no leaks elsewhere in the code.