I have a model class myModel, which I pre-train on device cuda:1 and then save to file modelFile. I don’t understand the behaviour when trying to load this model onto another, say cuda:0. Can someone help me understand what is going on behind the scenes when one has the following:
model = myModel()
model.load_state_dict(torch.load(modelFile))
model = model.eval().to("cuda:0")
and why this gives different results to:
model = myModel().to("cuda:0")
model.load_state_dict(torch.load(modelFile))
model = model.eval()
What does it mean for a saved model to be associated with a device, and why does setting model to a different device to the saved model cause problems? Should we always load the model first and then push to a new device?
EDIT: I should also say that I find it slightly disturbing that such a small, inconspicuous difference in the code causes huge implications for the model output. Would it be possible to flag a warning when the model and state_dicts have different devices?