Why are the model output results of training code and demo code different?

hello.
No matter how much I think about it, I ask a question because there is something I don’t understand.

We trained with the training code and saved the weights.
When I called the stored weights as demo code and looked at the output, the result was completely different.
I don’t know why.

So as an additional experiment, when I load the weights stored in the training code and look at the model output, it comes out normally, but when I load the weights with the demo code and look at the model output, the results are very strange.

As a result of additional checking, we confirmed that there is no problem with input and weight. Are there any other problems???

I use ddp for training and dp for running demo, could this be a problem??

I would recommend to use a static input (e.g. torch.ones) and compare the outputs of both approaches.
If these outputs are already different, the model’s parameters and/or buffer might not have been loaded correctly or the model has some randomness in its operations (e.g. dropout is used when you’ve forgotten to call model.eval()).
On the other hand, if the outputs of both models for a static input are equal, the issue would most likely be in the data loading and processing.

thank you
Thanks, I solved the problem.
The problem was the difference between model.train() and model.eval().
When running the demo, it was confirmed that the same result was obtained when proceeding with model.train().
However, when running the demo, I know that I have to use model.eval(). If so, am I storing the weights in the wrong way? Or did I load it wrong when I load it?
If not both, is the model the problem??

Additionally, the only difference between model.eval() and model.train() is whether to use drop out or batchnorm?? Are there any other differences?

I’m not sure which part to fix.

I don’t know how to interpret your post completely, but it seems you’ve narrowed down the difference of the output to the model.train() mode?
If so, call model.eval() in both scripts and compare the outputs again.

Dropout and batchnorm layers are both using the internal self.training flag, but generally every layer can use it, so it’s best practice to call model.eval() always during the validation run even if these two layers are not used.