Model produces different results, in-memory versus loaded from hard drive


I am training a classification model. I found that if I directly test the accuracy using the model (which is stored in memory), versus that if I first save the model state dict to hard drive, and load it back (the model structure is exactly the same), the results are slightly different (by 1% absolute). Did anyone also observe similar situations, what might be the reason?

Thank you very much.

1 Like

To clarify, I set model.eval() in both situations.

Are you sure there is no shuffling or any kind of randomization in your test procedure, data or code?
If not, are you running both models on same hardware (CPU vs GPU)?