Is pytorch simulating the quantization?

abidi_taha_yassine · February 26, 2024, 3:53pm

I’m curious about how PyTorch handles operations involving int8 tensors. Is PyTorch actually doing the computation in int8, or does it convert int8 tensors to floating-point format (fp32) before computation? (which I think this is what’s happening)
If so, I’m not seeing the potential benefits of quantization, unless they’re supposing that there’s specialized hardware capable of directly computing with fixed-point int8. Could someone shed light on this?

HDCharles · April 2, 2024, 6:05pm

yes generally pytorch quantization is actually doing the computation in int8 with int8 hardware operations.

some info here: gemmlowp/doc/quantization.md at master · google/gemmlowp · GitHub