Speed of cpu inference changes with parameter values

I have two models with the exact same architecture but different parameters. The parameters are stores as float32. When I time inference on each of these models I get roughly a 10x difference in speed. Could the actual values of the parameters affect inference time on a CPU?

Note: speeds are identical on GPU


The only case that I know of is for pathologically small float numbers.
You can try to do torch.set_flush_denormal(True) at the beginning of your script to see if it fixes it (doc here: torch.set_flush_denormal — PyTorch 1.7.0 documentation).

Thanks for the response. That did not solve anything. I am quite perplexed by this.

Do you have a small code sample that reproduces this that we could run on colab by any chance?

other possibilities:

  1. some indexing is done differently, one model has a lot more cache misses
  2. rejection [random] sampling
  3. some other operations with loops affected by values (some linear algebra routines, I think)
  4. order of measurements matters, as memory allocations are resolved differently (if you re-use the same process)
1 Like

What does it mean to have a lot more cache misses?

I meant L2/L3 system ram cache

I could not replicate it on google colab. I was able to get the speeds of both models to run the same. I think it was some type of caching issue. I have no real idea where to start to actually debug it.