I have two models with the exact same architecture but different parameters. The parameters are stores as float32. When I time inference on each of these models I get roughly a 10x difference in speed. Could the actual values of the parameters affect inference time on a CPU?
Note: speeds are identical on GPU
The only case that I know of is for pathologically small float numbers.
You can try to do
torch.set_flush_denormal(True) at the beginning of your script to see if it fixes it (doc here: torch.set_flush_denormal — PyTorch 1.7.0 documentation).
Thanks for the response. That did not solve anything. I am quite perplexed by this.
Do you have a small code sample that reproduces this that we could run on colab by any chance?
What does it mean to have a lot more cache misses?
I meant L2/L3 system ram cache
I could not replicate it on google colab. I was able to get the speeds of both models to run the same. I think it was some type of caching issue. I have no real idea where to start to actually debug it.