Hello everyone,
I have a question about loading a saved model.
I trained a model on a server with 8 GPU using nn.Dataparallel I got a performance of 78.07% accuracy on the validation set. But I reload the saved model on another server with 2 GPU (still using Dataparallel) to run the validation script (of course with the same dataset), and the performance decreases to 76.86% accuracy. I am wondering why.
Note: When I go back to the server where the model has been trained everything works normally, I found exactly the same performance by running the validation script.
I would like to know if I did something wrong either in saving the model or in re-loading the saved-model.
If you have any advice that could help will be welcome.
To save the model I used:
torch.save({
'epoch':epoch,
'model_state_dict': model.state_dict(),
'optimizer_state_dict':optimizer.state_dict(),}, ‘./save_model_folder/model.tar')
To load the saved model I used:
checkpoint = torch.load(‘./save_model_folder/model.tar’)
model.load_state_dict(checkpoint['model_state_dict'])
model.eval()
Thank you in advance.