When getting a state dict using <module>.state_dict()
the dictionary references the internal parameters of the model. Meaning, once the model changes, the dict will also change. Usually this doesn’t really impact things as most people will serialize the state to disk straight away.
If you however keep copies of the state dict in memory you won’t be able to load from these as their state is always the same as the networks state.
I ran into this while implementing early stopping and it took me a while to figure out. Loading the state_dict, using load_state_dict
, (obviously) just had no effect.
In the interest of making the solution more discoverable I figured I’d describe my troubles here.
Do you think this would warrant a mention in the official documentation? If so, should I just create a Github issue?