Speed of cpu inference changes with parameter values

shomerj · January 6, 2021, 6:57pm

I have two models with the exact same architecture but different parameters. The parameters are stores as float32. When I time inference on each of these models I get roughly a 10x difference in speed. Could the actual values of the parameters affect inference time on a CPU?

Note: speeds are identical on GPU

albanD · January 6, 2021, 7:18pm

Hi,

The only case that I know of is for pathologically small float numbers.
You can try to do torch.set_flush_denormal(True) at the beginning of your script to see if it fixes it (doc here: torch.set_flush_denormal — PyTorch 1.7.0 documentation).

shomerj · January 7, 2021, 3:05pm

Thanks for the response. That did not solve anything. I am quite perplexed by this.

albanD · January 7, 2021, 3:07pm

Do you have a small code sample that reproduces this that we could run on colab by any chance?

googlebot · January 7, 2021, 3:38pm

other possibilities:

some indexing is done differently, one model has a lot more cache misses
rejection [random] sampling
some other operations with loops affected by values (some linear algebra routines, I think)
order of measurements matters, as memory allocations are resolved differently (if you re-use the same process)

shomerj · January 7, 2021, 3:45pm

What does it mean to have a lot more cache misses?

googlebot · January 7, 2021, 3:53pm

I meant L2/L3 system ram cache

shomerj · January 7, 2021, 7:34pm

I could not replicate it on google colab. I was able to get the speeds of both models to run the same. I think it was some type of caching issue. I have no real idea where to start to actually debug it.