Qint8 Activations in PyTorch

xiechengyan · April 20, 2025, 1:21pm

Help please!
I am currently working on a project where I need to deploy a PyTorch-based neural network model onto an FPGA for inference. My goal is to quantize the model so that both the activations and weights are within the range of -128 to 127 (8-bit precision)

So far, I have experimented with both fbgemm and qnnpack for quantization:

Activation Quantization: I encountered an issue where I was unable to convert the activations to integer values in the range of -128 to 127. When I doing the inference on my computer, it say the conv layers do not support qint8 as input.
QNNPACK Compatibility: I also tried using QNNPACK for inference, but faced a limitation where QNNPACK does not support the conv3d_unpack operation.

Does PyTorch itself offer any features that could support the qint8 activations?
Or would I need to explore other frameworks or toolchains?

jerryzh168 · April 25, 2025, 10:54pm

yeah activation only supports quint8
I see, it’s possible that qnnpack does not support conv3d, but we are moving away from it

here is our new stack: Quantization — PyTorch main documentation please take a look at the tutorials attached in the end, this allows for extensions to any backends. available in 2.4 and later I think, we are also moving this to torchao: GitHub - pytorch/ao: PyTorch native quantization and sparsity for training and inference our single repository for pytorch native quantization and sparsity