Is pytorch simulating the quantization?

I’m curious about how PyTorch handles operations involving int8 tensors. Is PyTorch actually doing the computation in int8, or does it convert int8 tensors to floating-point format (fp32) before computation? (which I think this is what’s happening)
If so, I’m not seeing the potential benefits of quantization, unless they’re supposing that there’s specialized hardware capable of directly computing with fixed-point int8. Could someone shed light on this?

yes generally pytorch quantization is actually doing the computation in int8 with int8 hardware operations.

some info here: gemmlowp/doc/ at master · google/gemmlowp · GitHub