Saving model and Re-Laoding the saved-model

augusmaa · July 12, 2023, 3:00pm

Hello everyone,

I have a question about loading a saved model.
I trained a model on a server with 8 GPU using nn.Dataparallel I got a performance of 78.07% accuracy on the validation set. But I reload the saved model on another server with 2 GPU (still using Dataparallel) to run the validation script (of course with the same dataset), and the performance decreases to 76.86% accuracy. I am wondering why.

Note: When I go back to the server where the model has been trained everything works normally, I found exactly the same performance by running the validation script.

I would like to know if I did something wrong either in saving the model or in re-loading the saved-model.
If you have any advice that could help will be welcome.

To save the model I used:

torch.save({
             'epoch':epoch,
             'model_state_dict': model.state_dict(),
             'optimizer_state_dict':optimizer.state_dict(),}, ‘./save_model_folder/model.tar')

To load the saved model I used:

checkpoint = torch.load(‘./save_model_folder/model.tar’)
model.load_state_dict(checkpoint['model_state_dict'])
model.eval()

Thank you in advance.