I have trained model from gpu, and load the state_dict, but I find that it run slower than the model inited from random weight. I could find out the reason.
This is interesting. What exactly is the difference in times? Where are you loading the model weights?