I have two models with the exact same architecture but different parameters. The parameters are stores as float32. When I time inference on each of these models I get roughly a 10x difference in speed. Could the actual values of the parameters affect inference time on a CPU?
The only case that I know of is for pathologically small float numbers.
You can try to do torch.set_flush_denormal(True) at the beginning of your script to see if it fixes it (doc here: torch.set_flush_denormal — PyTorch 1.7.0 documentation).
I could not replicate it on google colab. I was able to get the speeds of both models to run the same. I think it was some type of caching issue. I have no real idea where to start to actually debug it.