Strange: calling model.eval() and forward introduced new parameters to the state_dict

Hi all, please I am currently facing the following problem while running main.py from the AWD LSTM repo at https://github.com/salesforce/awd-lstm-lm/blob/master/main.py. The authors use the less-preferred paradigm of saving and loading the entire trained model. So, I tried to modify their code to use the better approach of saving and loading the state_dict of the model. However, I landed an error in doing so. Specifically, when the model is initialiazed, its state_dict does not contain rnns.0.linear.module.weight. But after doing model.eval() followed by output, hidden = model(data, hidden), I found that rnns.0.linear.module.weight suddenly enters into the state_dict ! I do not know why this happens. I have already done a detailed tracing of this problem, printing the state_dict at multiple critical points in the script, and I am quite confident that the aforementioned point is where the problem occurs. The real problem is that this issue is prone to create problems during model deployment, because at that time, one would want to follow a two-step approach: First step is to define/initialize the model, and this gives a model without rnns.0.linear.module.weight in the state_dict. Second step is to load the state_dict of a pre-saved model unto the new model. But, now the pre-saved model’s state_dict would contain rnns.0.linear.module.weight and so, one lands an unexpected key error. Please kindly advice me on how to resolve this issue.

I’m not completely sure, but by skimming through the repository, it seems that some attributes are manipulated in this forward call, which might add these additional parameters.
If that’s the case, I think the valid approach would be to use a “fake forward pass” to initialize these parameters and load the state_dict afterwards.

Also, I agree that your workflow of creating the model beforehand and load the state_dict afterwards is the preferred way.

Thanks Ptrblck, I am glad that what I had in mind of a dummy forward pass agrees with your answer, but the problem is, during deployment, this might double inference time.