State_dict() changes with the model

lei_bai · June 10, 2019, 12:40pm

Hi all, I want to save the best model and then load it during the test. So I use the following method:

def train():
#training steps …
if acc > best_acc:
best_acc = acc
best_state = model.state_dict()
return best_state

then in the main function I use:
model.load_state_dict(best_state)
to resume the model.
However, I found that best_state is always the same as the last state during training, not the best state. Is anyone know the reason and how to avoid it? (Version: 1.1.0, Linux, GPU)

By the way, I know I can use torch.save(the_model.state_dict(), PATH) and then load the model by
the_model.load_state_dict(torch.load(PATH)).
However, I don’t want to save the parameters to file as train and test functions are in one file.

JuanFMontesinos · June 10, 2019, 1:06pm

I don’t remember right now but it may map to same memory address. Try to make a deep copy of state edict when you save it

iArunava · June 10, 2019, 3:14pm

Hey there! Welcome to the community!

if acc > best_acc:
  best_acc = acc # you missed this
  best_state = model.state_dict()
  . . .

Happy Coding

lei_bai · June 11, 2019, 12:09am

Thanks for your answer, I have update the sentence in the question. I have best_acc = acc in my code, it doesn’t work.

iArunava · June 11, 2019, 3:28am

Please paste the code properly in ```

P.S. I see that the
return best_state
is also there, are you sure you are returning after several epochs? And not just after 1 epoch? Or whenever the acc > best_acc ? ( It can’t be figured out as you have no indentations in your question.)

god_sp33d · June 11, 2019, 6:32am

Make a deep copy of state_dict ?

best_model = copy.deepcopy(model.state_dict())

iArunava · June 11, 2019, 7:05am

Yeah! Since you are not saving it to a file, might be the case that @god_sp33d and @JuanFMontesinos is right, that you need to make a deepcopy. DeepCopy should work, if you have no leaks elsewhere in the code.

lei_bai · June 13, 2019, 10:57am

Thanks, deepcopy works

lei_bai · June 13, 2019, 10:58am

I should use deepcopy, otherwise the saved state keeps changing with training