I have a NeuralNetwork model in which I registered in purpose two variables. When I save the state_dict to load back the model for evaluation I am getting the error:
File "../mlchem/potentials.py", line 191, in calculate
strict=True)
File "/home/muammar/.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 769, in load_state_dict
self.__class__.__name__, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for NeuralNetwork:
Unexpected key(s) in state_dict: "intercept_Cu", "slope_Cu".
How can you load a model like this? I need those two variables to be recognized by .load_state_dict() I would appreciate any suggestions.
PS. I checked other posts but I am not having the same error as they reported.
I assume the problem at inference time is that I instantiate the neural network class and those unexpected keys are not there. If I remove the conditional statement here I don’t get the error but inferences are wrong and the custom parameters are not the ones I optimized.
@ptrblck I tried with None but that did not work. However, if I set those custom variables to zero the model is loaded with strict=True. This is the commit 233aaa6 that fixed it.
Now, there is something weird… When using the state dictionary that is loaded to predict the points in the training set and verify things, then they don’t match… I will investigate more.
After loading the model from the state_dict this is what I get:
mlchem predicted energy = -14.418486595153809
mlchem predicted energy = -14.418549537658691
mlchem predicted energy = -14.418668746948242
mlchem predicted energy = -14.418920516967773
mlchem predicted energy = -14.419416427612305
mlchem predicted energy = -14.419506072998047
mlchem predicted energy = -14.419241905212402
mlchem predicted energy = -14.418638229370117
mlchem predicted energy = -14.41697883605957
mlchem predicted energy = -14.413832664489746
The parameters of a model should be consistency, no matter what mode it is (train or eval). So your patch commit is the right way to go.
And the parameters you used when training and inference have different values, so it will have different output. I do not know what you are actually doing here but the output of training and inference don’t match seems to reasonable to me.
And, it’s weird to loop the parameters in forward , you can refactor your code like this:
The parameters of a model should be consistency, no matter what mode it is (train or eval). So your patch commit is the right way to go.
Yes, that is correct. I thought only the hidden layers were needed but actually one needs to recreate the whole model.
And the parameters you used when training and inference have different values, so it will have different output. I do not know what you are actually doing here but the output of training and inference don’t match seems to reasonable to me.
Not at all in this case. Let me elaborate more. Those outputs shown above come from predicting over the training set using the state_dict of the epoch that fulfilled the criterion for training. That is the state_dict I saved so the predictions they have to match. Now, I found that the error was that I did not scale the features with the same scaler used for training.
And, it’s weird to loop the parameters in forward , you can refactor your code like this: